LINEAR ALGEBRA 
PROBLEM BOOK 


Paul R. Halmos 


THE 
DOLCIANI MATHEMATICAL EXPOSITIONS 


Published by 
THE MATHEMATICAL ASSOCIATION OF AMERICA 


Committee on Publications 
JAMES W. DANIEL, Chairman 


Dolciani Mathematical Expositions Editorial Board 
BRUCE P. PALKA, Editor 
CHRISTINE W. AYOUB 
IRL C. BIVENS 
BRUCE A. REZNICK 


The Dolciani Mathematical Expositions 


NUMBER SIXTEEN 


LINEAR ALGEBRA 
PROBLEM BOOK 


PAUL R. HALMOS 


Published and Distributed by 
THE MATHEMATICAL ASSOCIATION OF AMERICA 


©1995 by 
The Mathematical Association of America (Incorporated) 
Library of Congress Catalog Card Number 94-79588 


Complete Set ISBN 0-88385-300-0 
Vol. 16 ISBN 0-88385-322-1 


Printed in the United States of America 


Current printing (last digit): 
1098765432 


The DOLCIANI MATHEMATICAL EXPOSITIONS series of the Mathematical 
Association of America was established through a generous gift to the Association 
from Mary P. Dolciani, Professor of Mathematics at Hunter College of the City Uni- 
versity of New York. In making the gift, Professor Dolciani, herself an exceptionally 
talented and successful expositor of mathematics, had the purpose of furthering the 
ideal of excellence in mathematical exposition. 

The Association, for its part, was delighted to accept the gracious gesture initiating 
the revolving fund for this series from one who has served the Association with 
distinction, both as a member of the Committee on Publications and as a member of 
the Board of Governors. It was with genuine pleasure that the Board chose to name 
the series in her honor. 

The books in the series are selected for their lucid expository style and stimulating 
mathematical content. Typically, they contain an ample supply of exercises, many 
with accompanying solutions. They are intended to be sufficiently elementary for the 
undergraduate and even the mathematically inclined high-school student to understand 
and enjoy, but also to be interesting and sometimes challenging to the more advanced 
mathematician. 


. Mathematical Gems, Ross Honsberger 

. Mathematical Gems II, Ross Honsberger 

. Mathematical Morsels, Ross Honsberger 

. Mathematical Plums, Ross Honsberger (ed.) 

Great Moments in Mathematics (Before 1650), Howard Eves 

Maxima and Minima without Calculus, Ivan Niven 

Great Moments in Mathematics (After 1650), Howard Eves 

. Map Coloring, Polyhedra, and the Four-Color Problem, David Barnette 

. Mathematical Gems III, Ross Honsberger 

. More Mathematical Morsels, Ross Honsberger 

- Old and New Unsolved Problems in Plane Geometry and Number Theory, Victor 
Klee and Stan Wagon 

. Problems for Mathematicians, Young and Old, Paul R. Halmos 

. Excursions in Calculus: An Interplay of the Continuous and the Discrete, Robert 
M. Young 

- The Wohascum County Problem Book, George T. Gilbert, Mark I. Krusemeyer, 
Loren C. Larson 

15. Lion Hunting and Other Mathematical Pursuits: A Collection of Mathematics, 

Verse, and Stories by Ralph P. Boas, Jr, Gerald L. Alexanderson and Dale H. 
Mugler (eds.) 

16. Linear Algebra Problem Book, Paul R. Halmos 

17. From Erdös to Kiev: Problems of Olympiad Caliber, Ross Honsberger 

18. Which Way Did the Bicycle Go? .. .and Other Intriguing Mathematical Mysteries, 

Joseph D. E. Konhauser, Dan Velleman, and Stan Wagon 


— O00 0t BRUIM-— 


— = — = 
w N 


— 
A 


Mathematical Association of America 
P. O. Box 91112 
Washington, DC 20090-1112 
1-800-331-IMAA FAX: 1-301-206-9789 


PREFACE 


Is it fun to solve problems, and is solving problems about something a good 
way to learn something? The answers seem to be yes, provided the prob- 
lems are neither too hard nor too easy. 

The book is addressed to students (and teachers) of undergraduate lin- 
ear algebra—it might supplement but not (I hope) replace my old Finite- 
Dimensional Vector Spaces. It largely follows that old book in organization 
and level and order—but only "largely"—the principle is often violated. 
This is not a step-by-step textbook—the problems vary back and forth be- 
tween subjects, they vary back and forth from easy to hard and back again. 
The location of a problem is not always a hint to what methods might be 
appropriate to solve it or how hard it is. 

Words like “hard” and “easy” are subjective of course. I tried to make 
some of the problems accessible to any interested grade school student, 
and at the same time to insert some that might stump even a professional 
expert (at least for a minute or two). Correspondingly, the statements of 
the problems, and the introductions that precede and the solutions that 
follow them sometimes laboriously explain elementary concepts, and, at 
other times assume that you are at home with the language and attitude of 
mathematics at the research level. Example: sometimes I assume that you 
know nothing, and carefully explain the associative law, but at other times 
I assume that the word “topology”, while it may not refer to something that 
you are an expert in, refers to something that you have heard about. 

The solutions are intrinsic parts of the exposition. You are urged to 
look at the solution of each problem even if you can solve the problem with- 
out doing so—the solution sometimes contains comments that couldn’t be 
made in the statement of the problem, or even in the hint, without giving 
too much of the show away. 
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I hope you will enjoy trying to solve the problems, I hope you will learn 
something by doing so, and I hope you will have fun. 
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Chapter 9 


CHAPTER 1 


SCALARS 


1. Double addition 


Is it obvious that 


63 + 48 = 27 + 84? 


It is a true and thoroughly uninteresting mathematical statement that can 
be verified in a few seconds—but is it obvious? If calling it obvious means 
that the reason for its truth is clearly understood, without even a single 
second’s verification, then most people would probably say no. 

What about 


(27 + 36) + 48 = 27 + (36 + 48) 


—is that obvious? Yes it is, for most people; the instinctive (and correct) 
reaction is that the way the terms of a sum are bunched together cannot 
affect the answer. The approved technical term is not “bunch together” but 
“associate”; the instinctive reaction is a readiness to accept what is called 
the associative law of addition for real numbers. (Surely every reader has 
noticed by now that the non-obvious statement and the obvious one are in 
some sense the same: 


63=27+36 and 84= 36 + 48.) 


Linear algebra is concerned with several different kinds of operations 
(such as addition) on several different kinds of objects (not necessarily real 
numbers). To prepare the ground for the study of strange operations and to 
keep the associative law from being unjustly dismissed as a triviality, a little 
effort to consider some good examples and some bad ones is worthwhile. 
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Some of the examples will be useful in the sequel, and some won’t—some 
are here to show that associativity can fail, and others are here to show that 
even when it holds it may be far from obvious. In the world of linear algebra 
non-associative operations are rare, but associative operations whose good 
behavior is not obvious are more frequently met. 


Problem 1. Jfa new addition for real numbers, denoted by the tem- 
porary symbol [+], is defined by 


a [18 = 2a + 28, 
is associative? 


Comment. The plus sign on the right-hand side of the equation denotes 
ordinary addition. 
Note: since ordinary addition is commutative, so that 


2a + 28 = 26 + 2a, 
it follows that 


a[*]8 = B[*]a. 


Conclusion: the new addition is also commutative. 


2. Half double addition 


Problem 2. [fa new addition for real numbers, denoted by the tem- 
porary symbol [+], is defined by 


a[*]8 = 2o + 6, 
is associative? 


Comment. Since 2a + B is usually different from 23 + a, this [+] is not 
commutative. 


3. Exponentiation 


Problem3. Jfanoperation for positive integers, denoted by the tem- 
porary symbol «, is defined by 


a * B — of, 


is it commutative? Is it associative? 
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4. Complex numbers 


Suppose that an operation is defined for ordered pairs of real numbers, 
that is for objects that look like (a, 3) with both a and real, as follows: 


(a, B) [E] (7,6) = (a+ y, 8 + ô). 


Is it commutative? Sure, obviously—how could it miss? All it does is perform 
the known commutative operation of addition of real numbers twice, once for 
each of the two coordinates. Is it associative? Sure, obviously, for the same 
reason, 

The double addition operations in Problems 1 and 2 are artificial; they 
were cooked up to make a point. The operation of exponentiation in Problem 
3 is natural enough, and that is its point: "natural" operations can fail to be 
associative. The coordinatewise addition here defined for ordered pairs is a 
natural one also, but it is far from the only one that is useful. 


Problem 4. Jf an operation for ordered pairs of real numbers, 
denoted by the temporary symbol [:], is defined by 


(a, B) E (7,6) = (ay m B6, aĝ + By), 
is it commutative? Is it associative? 
Comment. The reason for the use of the symbol [] (instead of [+]) is 
twofold: it is reminiscent of multiplication (instead of addition), and it avoids 


confusion when the two operations are discussed simultaneously (as in many 
contexts they must be). 


5. Affine transformations 


Looking strange is not necessarily a sign of being artificial or useless. 


Problem 5. Jf an operation for ordered pairs of real numbers, 
denoted by [:| again, is defined by 


(a, B) E] (y, 6) = (ary, a6 + 8), 


is it commutative? Is it associative? 


6. Matrix multiplication 


The strange multiplication of Problem 5 is a special case of one that is more 
complicated but less strange. 
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Problem 6. Jf an operation for ordered quadruples of real num- 
bers, denoted by [:] , is defined by 


(a, 8,7, 6) E] (0, BY, 6) 
= (aa! + By’, a’ + B6', ya +y, yB + 66’), 


is it commutative? Is it associative? 


Comment. How is the multiplication of Problem 5 for ordered pairs a "spe- 
cial case" of this one? Easy: restrict attention to only those quadruples 
(a, 3,,6) for which y = 0 and 6 = 1. The [-] product of two such 
special quadruples is again such a special one; indeed if y = ^ = 0 and 
6 = &' = 1, then ya! + ôy = 0 and y8’ + 66’ = 1. The first two coordinates 
of the product are aa’ and a3’ + 8, and that's in harmony with Problem 5. 

Another comment may come as an additional pleasant surprise: the mul- 
tiplication of complex numbers discussed in Problem 4 is also a special case 
of the quadruple multiplication discussed here. Indeed: restrict attention to 
only those quadruples that are of the form 


(a, B, —B,a), 


and note that 


(a, B, —B, a) [] is 6, = 5 6) = (ay = B6, aĝ + BY, —By zd aĝ, — pé + ay) 


—in harmony with Problem 4. 


7. Modular multiplication 


Define an operation, denoted by [-], for the numbers 0, 1, 2, 3, 4, 5 as 
follows: multiply as usual and then throw away multiples of 6. (The technical 
expression is “multiply modulo 6”.) Example: 4 [-] 5 = 2 and 2 [] 3 = 0. 


Problem 7. Js multiplication modulo 6 commutative? Is it asso- 
ciative? What if 6 is replaced by T: do the conclusions for 6 remain 
true or do they change? 


8. Small operations 


Problem 7 shows that interesting operations can exist on small sets. Small 
sets have the added advantage that sometimes they can forewarn us about 
some dangers that become more complicated, and therefore harder to see, 
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when the sets get larger. Another reason small sets are good is that op- 
erations on them can be defined in a tabular manner that is reassuringly 
explicit. 

Consider, for instance, the table 


which defines multiplication modulo 3 for the numbers 0, 1, 2. The infor- 
mation such tables are intended to communicate is that the product of the 
element at the left of a row by the element at the top of a column, in that 
order, is the element placed where that row and that column meet. Exam- 
ple: 2 x 2 = 1 modulo 3. 

It might be worth remarking that there is also a useful concept of ad- 
dition modulo 3; it is defined by the table 


It's a remarkable fact that addition and multiplication modulo 3 possess 
all the usually taught properties of the arithmetic operations bearing the 
same names. They are, for instance, both commutative and associative, 
they conspire to satisfy the distributive law 


a x (B - y) 2 (a x B) + (a x y), 


they permit unrestricted subtraction (so that, for example, 1 — 2 — 2), and 
they permit division restricted only by the exclusion of the denominator 0 
(so that, for example, 1 = 2). In a word (officially to be introduced and 
studied later) the integers modulo 3 form a field. 

Problem 1 is about an operation that is commutative but not associa- 
tive. Can that phenomenon occur in small sets? 


Problem 8. Js there an operation in a set of three elements that is 
commutative but not associative? 
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9. Identity elements 


The commonly accepted attitudes toward the commutative law and the 
associative law are different. Many real life operations fail to commute; 
the mathematical community has learned to live with that fact and even to 
enjoy it. Violations of the associative law, on the other hand, are usually 
considered by specialists only. Having made the point that the associative 
law deserves respect, this book will concentrate in the sequel on associative 
operations only. The next job is to see what other laudable properties such 
operations can and should possess. 

The sum of 0 and any real number a is a again; the product of 1 and 
any real number o is a again. The phenomenon is described by saying that 
0 and 1 are identity elements (or zero elements, or unit elements, or neutral 
elements) for addition and multiplication respectively. An operation that 
has an identity element is better to work with than one that doesn’t. Which 
ones do? 


Problem 9. Which of the operations 
(1) double addition, 
(2) half double addition, 
(3) exponentiation, 
(4) complex multiplication, 
(5) multiplication of affine transformations, 
(6) matrix multiplication, 
and 
(7) modular addition and multiplication 
have an identity element? 


In the discussion of operations, in Problems 1-8, the notation and 
the language were both additive (+, sum) and multiplicative (x, prod- 
uct). Technically there is no difference between the two, but traditionally 
multiplication is the more general concept. In the definition of groups, for 
instance (to be given soon), the notation and the language are usually mul- 
tiplicative; the additive theory is included as a special case. A curious but 
firmly established part of the tradition is that multiplication may or may 
not be commutative, but addition always is, The tradition will be followed 
in this book, with no exceptions. 

An important mini-theorem asserts that an operation can have at most 
one identity element. That is: if x is an operation and both e and e' are 
identity elements for it, so that 


EXa=axe=a and exa=axe=a 
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for all a, then 


Proof. Use e' itself for a in the equation involving e, and use £ for a in 
the equation involving £’. The conclusion is that £ x &' is equal to both € 
and e', and hence that £ and e' are equal to each other. 


Comment. The proof just given is intended to emphasize that an identity 
is a two-sided concept: it works from both right and left. 


10. Complex inverses 


Is there a positive integer that can be added to 3 to yield 8? Yes. 

Is there a positive integer that can be added to 8 to yield 3? No. 

In the well-known language of elementary arithmetic: subtraction 
within the domain of positive integers is sometimes possible and some- 
times not. 

Is there a real number that can be added to 5 to yield 0? Yes, namely 
—5. Every real number has a negative, and that fact guarantees that within 
the domain of real numbers subtraction is always possible. (To find a num- 
ber that can be added to 8 to yield 3, first find a number that can be added 
to 3 to yield 8, and then form its negative.) 

The third basic property of operations that will be needed in what fol- 
lows (in addition to associativity and the existence of neutral elements) 
is the possibility of inversion. Suppose that * is an operation (a temporary 
impartial symbol whose role in applications could be played by either addi- 
tion or multiplication), and suppose that the domain of * contains a neutral 
element e, so that € * a = a * € = a for all x. Under these circumstances 
an element £ is called an inverse of x ( * inverse) if 


a*8-—8*a-t. 


Obvious example: every real number a has a 4- inverse, namely —o. Wor- 
risome example: not every real number has a x inverse. The exception is 
0; there is no real number £ such that 0 x 8 = 1. That is the only exception: 
if a # 0, then the reciprocal o7! (= +) is a x inverse. These examples are 
typical. The use of additive notation is usually intended to suggest the ex- 
istence of inverses (+ inverses, negatives) for every element, whereas for 
multiplicatively written operations some elements can fail to be invertible, 
that is, can fail to possess inverses ( x inverses, reciprocals). 
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The definition of * inverse makes sense in complete generality, but 
it is useful only in case * is associative. The point is that for associative 
operations an important mini-theorem holds: an element can have at most 
one inverse. That is: if both 8 and y are * inverses of a, so that 

axf—g*a-t and a*y-—"Yy*a-se, 
then B = y. Proof: combine all three, y, and a, and £, in that order, and 
use the associative law. Looked at one way the answer is 
Y*(a*B) =y*E=7, 
whereas the other way it is 
(y«a)« 8 — e* 8 — B. 
The conclusion is that the triple combination y * a * 8 is equal to both y 


and f, and hence that y and 8 are equal to each other. 


Problem 10. For complex multiplication (defined in Problem 4), 
which ordered pairs (a, 9) are invertible? Is there an explicit formula 
for the inverses of the ones that are? 


11. Affine inverses 


Problem 11. For the multiplication of affine transformations (de- 
fined in Problem 5), which ordered pairs (œ, B) are invertible? Is there 
an explicit formula for the inverses of the ones that are? 


12. Matrix inverses 


Problem 12. Which of the 2 x 2 matrices 


a B 

y 6 
(for which multiplication was defined in Problem 6) are invertible? 
Is there an explicit formula for the inverses of the ones that are? 


13. Abelian groups 


Numbers can be added, subtracted, multiplied, and (with one infamous ex- 
ception) divided. Linear algebra is about concepts called scalars and vec- 
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tors. Scalars are usually numbers; to understand linear algebra it is neces- 
sary first of all to understand numbers, and, in particular, it is necessary to 
understand what it means to add and subtract them. The general concept 
that lies at the heart of such an understanding is that of abelian groups. 

Consider, as an example, the set Z of all integers (positive, negative, 
or zero) together with the operation of addition. The sum of two integers 
is an integer such that: 


addition is commutative, meaning that the sum of two integers is in- 
dependent of the order in which they are added, 


rty-—-yctm 


addition is associative, meaning that the sum of three integers, pre- 
sented in a fixed order, is independent of the order in which the two addi- 
tions between them are performed, 


(c+y)+z=2+(y+2); 


the integer 0 plays a special role in that it does not change any integer 
that it is added to, 


2+0=0+2=2; 


and every addition can be “undone” by another one, namely the addi- 
tion of the negative of what was just added, 


x+(-«) =(-2)+2=0. 


This example is typical. The statements just made about Z and + are 
in effect the definition of the concept of abelian group. Almost exactly the 
same statements can be made about every abelian group; the only differ- 
ences are terminological (the words “integer” and “addition” may be re- 
placed by others) and notational (the symbols 0 and + may be replaced by 
others). 

Another example is the set Z 2 consisting of the integers between 0 
and 11 inclusive, with an operation of addition (temporarily denoted by *) 
defined this way: if the sum of two elements of Z42 (in the ordinary meaning 
of sum) is less than 12, then the new sum is equal to that ordinary sum, 


z*y-—rcy 


but if their ordinary sum is 12 or more, then the new sum is the ordinary 
sum with 12 subtracted, 


r*y-—rz-y-12. 
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The operation * is called addition modulo 12, and is usually denoted by 
just plain +, or, if desired, by + followed soon by an explanatory “mod 
12". The verification that the four typical sentences stated above for Z are 
true for Zı2 is a small nuisance, but it’s painless and leads to no surprises. 
(The closest it comes to a surprise is that the role of the negative of x is 
played by 12 — z.) 

Here is another example of an abelian group: the set R} of positive 
real numbers, with an operation, temporarily denoted by x, defined as or- 
dinary numerical multiplication: 


Ley = TY. 


Everybody believes commutativity and associativity; the role of zero is 
played this time by the real number 1, 


r*l—l*rz-za, 


and the role of the negative of x is played by the reciprocal of x 


The general definition of an abelian group should be obvious by now: 
it is a set G with an operation of “addition” defined in it (so that when- 
ever x and y are in G, then z + y is again an element of G), satisfying the 
four conditions discussed above. (They are: commutativity; associativity; 
the existence of a zero; and, corresponding to each element, the existence 
of a negative of that element.) 

The word “abelian” means exactly the same as “commutative”. If an 
operation in a set satisfies the last three conditions but not necessarily the 
first, then it is called a group. Non-commutative groups also enter the study 
of linear algebra, but not till later, and not as basically as the commutative 
ones. 


Problem 13. (a) Jfa new operation x is defined in the set R4. of 
positive real numbers by 


z*y = min(z, y), 
does R become an abelian group? 
(b) Jf an operation * is defined in the set (1,2, 3, 4, 5) of positive 
integers by 
z * y = max(z,y), 


does that set become an abelian group? 
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(c) If x and y are elements of an abelian group such that x+y = 
y, does it follow that x = 0? 


Comment. The abbreviations “min” and “max” are for minimum and 
maximum; min(2, 3) = 2, max(—2, —3) = —2, and min(5, 5) = 5. 

Parts (a) and (b) of the problem test the understanding of the defini- 
tion of abelian group. The beginning of a systematic development of group 
theory (abelian or not) is usually a sequence of axiom splitting delicacies, 
which are fussy but can be fun. A sample is the mini-theorem discussed in 
Problem 9, the one that says that there can never be more than one ele- 
ment that acts the way 0 does. Part (c) of this problem is another sample 
of the same kind of thing. It is easy, but it’s here because it is useful and, 
incidentally, because it shows how the defining axioms of groups can be 
useful. What was proved in Problem 9 is that if an element acts the way 0 
does for every element, then it must be 0; part (c) here asks about elements 
that act the way 0 does for only one element. 


14. Groups 


According to the definition in Problem 13 a set endowed with an operation 
that has all the defining properties of an abelian group except possibly the 
first, namely commutativity, is called just simply a group. (Recall that the 
word “abelian” is a synonym for “commutative”.) Emphasis: the operation 
is an essential part of the definition; if two different operations on the same 
set both satisfy the defining conditions, the results are regarded as two 
different groups. 

Probably the most familiar example of an abelian group is the set Z 
of all integers (positive, negative, and zero), or, better said, the group is 
the pair (Z,+), the set Z together with, endowed with, the operation of 
addition. It is sometimes possible to throw away some of the integers and 
still have a group left; thus, for instance, the set of all even integers is a 
group. Throwing things away can, however, be dangerous: the set of pos- 
itive integers is not an additive group (there is no identity element: 0 is 
missing), and neither is the set of non-negative integers (once 0 is put back 
in it makes sense to demand inverses, but the demand can be fulfilled only 
by putting all the negative integers back in too). 

The set of real numbers with addition, in symbols (R, +), is a group, 
but (R, x), the set of real numbers with multiplication is not—the number 
0 has no inverse. The set of non-zero real numbers on the other hand, is 
a multiplicative group. The same comments apply to the set C of complex 
numbers. The set of positive real numbers is a group with respect to mul- 
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tiplication, but the set of negative real numbers is not—the product of two 
of them is not negative. 

Group theory is deep and pervasive: no part of mathematics is free of 
its influence. At the beginning of linear algebra not much of it is needed, 
but even here it is a big help to be able to recognize a group when one 
enters the room. 


Problem 14. s the set of all affine transformations £ — o£ + 8 
(with the operation of functional composition) a group? What about 
the set of all 2 x 2 matrices 


(7 8): 


(with matrix multiplication)? Is the set of non-zero integers modulo 
6 (that is: the set of numbers 1, 2, 3, 4, 5) a group with respect to mul- 
tiplication modulo 6? What if 6 is replaced by 7: does the conclusion 
remain true or does it change? 


Comment. The symbol —, called the barred arrow, is commonly used for 
functions; it serves the purpose of indicating the “variable” that a function 
depends on. To speak of “the function 2z + 3" is bad form; what the ex- 
pression 2z 4- 3 denotes is not a function but the value of a function at the 
number z. Correct language speaks of the function 


z-2z-3, 


which is an abbreviation for “the function whose value at each z is 2z +3”. 


15. Independent group axioms 


A group is a set with an operation that has three good properties, namely 
associativity, the existence of an identity element, and the existence of in- 
verses. Are those properties totally independent of one another, or is it the 
case that some of them imply some of the others? So, for example, must an 
associative operation always have an identity element? The answer is no, 
and that negative answer is one of the first things that most children learn 
about arithmetic. We all learn early in life that we can add two positive 
integers and get a third one, and we quickly recognize that that third one 
is definitely different from both of the numbers that we started with—the 
discovery of zero came to humanity long after the discovery of addition, 
and it comes similarly late to each one of us. Very well then, if we have an 
associative operation that does possess an identity element, does it follow 
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that every element has an inverse? The negative answer to that question 
reaches most of us not long after our first arithmetic disappointment (see 
Problem 13): in the set (0, 1, 2, 3, . . .} we can add just fine, and 0 is an iden- 
tity element for addition, but + inverses are hard to come by—subtraction 
cannot always be done. After these superficial comments there is really 
only one sensible question left to ask. 


Problem 15. Can there exist a non-associative operation with an 
identity element, such that every element has an inverse? 


16. Fields 


If, temporarily, “number” is interpreted to mean “integer”, then numbers 
can be added and subtracted and multiplied, but, except accidentally as 
it were, they cannot be divided. If we insist on dividing them anyway, we 
leave the domain of integers and get the set Q of all quotients of integers— 
in other words the set Q of rational numbers, which is a “field”. 

(Does everyone know about rational numbers? A real number is called 
rational if it is the ratio of two integers. In other words, x is rational just 
in case there exist integers m and n such that z = ™. Examples: 2, —2, 
0, 1, 10, —10, 55. Note: á and = are additional representations of the 
rational number 2 already mentioned; there are many others. Celebrated 
counterexamples: /2 and r. A proof that v2 is not rational was known to 
humanity well over 2000 years ago; the news about 7 is only a little more 
than 200 years old. That (Q, 4-) is a group needs to be checked, of course, 
but the check is easy, and the same is true for (Q — (0), x).) 

Probably the best known example of a field is the set R of real num- 
bers endowed with the operations of addition and multiplication. As far 
as addition goes, R is an abelian group, and so is Q. The corresponding 
statement for multiplication is not true; zero causes trouble. Since 0x = 0 
for every real number z, it follows that there is no x such that Oz = 1; the 
number 0 does not have a multiplicative inverse. If, however, R is replaced 
by the set R* of real numbers different from 0 (and similarly Q is replaced 
by the set Q* of rational numbers different from 0), then everything is all 
right again: R* with multiplication is an abelian group, and so is Q*. 

It surely does not come as a surprise that the same statements are true 
about the set C of all complex numbers with addition and multiplication, 
and, indeed, C is another example of a field. The properties of Q and R 
and C that have been mentioned so far, are, however, not quite enough to 
guess the definition of a field from. What the examples suggest is that a field 
has two operations; what they leave out is a connection between them. It is 
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mathematical malpractice to endow a set with two different structures that 
have nothing to do with each other. In the examples already mentioned 
addition and multiplication together form a pleasant and useful conspiracy 
(in fact two conspiracies), called the distributive law (or laws): 


a(z+y)=ar+ay and (a+ B)z = ar + Bz, 


and once that is observed the correct definition of fields becomes guess- 
able. A field is a set F with two operations + and x such that with + the 
entire set F is an abelian group, with x the diminished set F* (omit 0) is 
an abelian group, and such that the distributive laws are true. 

(Is it clear what “0” means here? It is intended to mean the identity 
element of the additive group F . The notational conventions for real num- 
bers are accepted for all fields: 0 is always the additive neutral element, 1 
is the multiplicative one, and, except at rare times of emphasis, multiplica- 
tion is indicated simply by juxtaposition.) 

Some examples of fields, less obvious than R, Q, and C, deserve men- 
tion. One good example is called Q(4/2); it consists of all numbers of the 
form a+(3\/2 where o and f are rational; the operations are the usual addi- 
tion and multiplication of real numbers. All parts of the definition, except 
perhaps one, are obvious. What may not be obvious is that every non-zero 
element of Q(4/2) has a reciprocal. The proof that it is true anyway, obvi- 
ous or no, is the process of rationalizing the denominator (used before in 


Solution 10). That is: to determine : ; 
at B2 


multiply both numerator and 


denominator by a — 94/2 and get 


a — By2 A a B EK NE Cd 
—28? q2—29g  q2—2g 
The only thing that could possibly go wrong with this procedure is that the 
denominator o? — 2? is zero, and that cannot happen (unless both o and 
B are zero to begin with)—the reason it cannot happen is that v2 is not 
rational. 

Could it happen in a field that the additive identity element is equal to 
the multiplicative one, that is that 0 = 1? That is surely not the intention of 
the definition. A legalistic loophole that avoids such degeneracy is to recall 
that a group is never the empty set (because, by assumption, it contains an 
identity element). It follows that if F is a field, then F — (0) is not empty. 
That does it: 1 is an element of F — {0}, and therefore 1 £ 0. 

If F is a field, then both F with + and F* with x are abelian groups, but 
neither of these facts has anything to do with the multiplicative properties 
of the additive inverse 0. As far as they are concerned, do fields in general 
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behave the way Q, R, and C do, or does the generality permit some unusual 
behavior? 


Problem 16. Must multiplication in a field be commutative? 


17, Addition and multiplication in fields 


If a question about the behavior of the elements of a field concerns only 
one of the two operations, it is likely to be easy and uninteresting. Example: 
is it true that if 


at+y=B+%, 


then a = (additive cancellation law)? Answer: yes—just add —y to both 
sides. A pleasant consequence (that sometimes bothers mathematical be- 
ginners): since 


a+(-a) =0 


and 


(-a) + (—(—a)) = 0, 


the commutativity of addition and the additive cancellation law imply that 
—(-a) =a. 


The tricky and useful questions about fields concern addition and mul- 
tiplication simultaneously. 


Problem 17. Suppose that F is a field and a and ß are in F. Which 
of the following plausible relations are necessarily true? 

(a3) 0 x a — 0. 

(b) (—1)a = —a. 

(c) (-a)(—B) = a£. 

(d) 1-1 z 0. 

(e) ff a Z 0 and BF 0, then aß 7 0. 


Comment. Observe that both operations enter into each of the five re- 
lations. (a) What is the multiplicative behavior of the additive unit? (b) 
What is the multiplicative behavior of the additive inverse of the multi- 
plicative unit? (c) What is the multiplicative behavior of additive inverses 
in general? (d) What is the additive behavior of the multiplicative unit? (e) 
What is the relation of niultiplication to the additive unit? 
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18. Distributive failure 


Problem 18. Zs there a set F with three commutative operations +, 
xy, and x» such that (F, +) is a group, both (F — (0), xı) and 
(F — {0}, x2) are groups, but only one of (F, +, x1) and (F, +, x2) 
is a field? 


19. Finite fields 


Could it happen that a field has only finitely many elements? Yes, it could; 
one very easy example is the set Z5 consisting of only the two integers 0 
and 1 with addition and multiplication defined modulo 2. (Compare the 
non-field constructed in Problem 16.) 

The same sort of construction (add and multiply modulo something) 
yields the field Z = {0,1,2} with addition and multiplication modulo 
3. If, however, 4 is used instead of 2 or 3, something goes wrong; the set 
Z4 = {0,1,2,3} with addition and multiplication modulo 4 is not a field. 
What goes wrong is not that 2+ 2 = 0—there is nothing wrong with that— 
but that 2 x 2 = 0—that’s bad. The reason it's bad is that it stops 2 from 
having a multiplicative inverse; the set Z4 (= Z4 — {0}) is not an abelian 
group. 

Further experimentation along these lines reveals that Zs isa field, but 
Ze is not; Zy is a field, but Za, Zo, and Zyo are not; Z; is a field, but Zj 
is not. (In Zg, 2 x 4 = 0; in Zg, 3 x 3 = 0; etc. ) General fact (not hard to 
prove): Zn is a field if and only if the modulus n is a prime. 

The fact that Z4 is not a field shows that a certain way of defining 
addition and multiplication for four elements does not result in a field. Is 
it possible that different definitions would lead to a different result? 


Problem 19. Zs there a field with four elements? 


CHAPTER 2 


VECTORS 


20. Vector spaces 


Real numbers can be added, and so can pairs of real numbers. If R? is the 
set of all ordered pairs (a, 8) of real numbers, then it is natural to define 
the sum of two elements of R? by writing 


(a, B) + (7,6) = (a+ y, 8 +6) 


and the result is that R? becomes an abelian group. There is also a kind of 
partial multiplication that makes sense and is useful, namely the process 
of multiplying an element of R? by a real number and thus getting another 
element of R?: 


o(B, y) = (a, ay). 

The end result of these comments is a structure consisting of three parts: 
an abelian group, namely R? , a field, namely R, and a way of multiplying 
the elements of the group by the elements of the field. 

For another example of the kind of structure that linear algebra stud- 
ies, consider the set P of all polynomials with real coefficients. The set P, 
endowed with the usual notion of addition of polynomials, is an abelian 
group. Just as in the case of R? there is a multiplication that makes useful 
sense, namely the process of multiplying a polynomial p by a real number: 


(ap)(x) = a: p(z). 
The result is, as before, a triple structure: an abelian group P, a field R, 
and a way of multiplying elements of P by elements of R. 
The modification of replacing the set P of all real polynomials (real 
polynomial is a handy abbreviation for “polynomial with real coefficients”) 
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by the set P of all real polynomials of degree less than or equal to 3 is 
sometimes more natural to use than the unmodified version. The sum of two 
elements of P3 is again an element of P5, and so is the product of an element 
of P; by a real number, and that's all there is to it. 

One more example, and that will be enough for now. This time let V be 
the set of all ordered triples (a, 8, ^y) of real numbers such that 


a+6+7=0. 
Define addition by 


(o, B. y) + (a, 8. y) = (ata B+ By Y), 
define multiplication by 


o(B,y,6) = (aß, ay, o), 


and observe that the result is always an ordered triple with sum zero. The set 
of all such triples is, once again, an abelian group (namely V), a field, and a 
sensible way of multiplying group elements by field elements. 

The general concept of a vector space is an abstraction of examples such 
as the ones just seen: it is a triple consisting of an abelian group, a field, and a 
multiplication between them. Recall, however, that it is immoral, illegal, and 
unprofitable to endow a set with two or more mathematical structures without 
tightly connecting them, so that each of them is restricted in an essential way. 
(The best known instance where that somewhat vague commandment is reli- 
giously obeyed is the definition of a field in Problem 16: there is an addition, 
there is a multiplication, and there is the essential connection between them, 
namely the distributive law.) 

A vector space over a field F (of elements called scalars) is an addi- 
tive (commutative) group V (of elements called vectors), together with an 
operation that assigns to each scalar a and each vector x a product ax that 
is again a vector. For such a definition to make good mathematical sense, 
the operation (called scalar multiplication) should be suitably related to the 
three given operations (addition in IF, addition in V, and multiplication in F). 
The conditions that present themselves most naturally are these. 

The vector distributive law: 


(a+ 8B)z — oz + x 


whenever « and 5 are scalars and g is a vector. (In other words, multiplication 
by a vector distributes over scalar addition.) 
The scalar distributive law: 


alt +y) =ar +ay 
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whenever a is a scalar and x and y are vectors. (In other words, multipli- 
cation by a scalar distributes over vector addition). 
The associative law: 


(aB)z = a(Bz) 


whenever a and f are scalars and z is a vector. 
The scalar identity law: 


lz=2 


for every vector x. (In other words, the scalar 1 acts as the identity trans- 
formation on vectors). 

(The reader has no doubt noticed that in scalar multiplication the 
scalar is always on the left and the vector on the right—since the other 
kind of multiplication is not even defined, it makes no sense to speak of 
a commutative law. Nothing is lost by this convention, and something is 
gained: the very symbol for a product indicates which factor is the scalar 
and which the vector.) 

Many questions can and should be asked about the conditions that 
define vector spaces: one worrisome question has to do with multiplication, 
and another one, easier, has to do with zero. 

Why, it is natural to ask, is a multiplicative structure not imposed on 
vector spaces? Wouldn't it be natural and useful to define (a, 8) : (y,6) = 
(ay, 86) (similarly to how addition is defined in R?)? The answer is no. 
The trouble is that even after the zero element (that is, the element (0, 0)) 
of R? is discarded, the remainder does not constitute a group; a pair that 
has one of its coordinates equal to 0, such, for instance, as (1,0), does not 
have an inverse. The same question for P is tempting: the elements of P can 
be multiplied as well as added. Once again, however, the result does not 
convert P into a field; the multiplicative inverse of a polynomial is very un- 
likely to be a polynomial. Examples such as P3 add to the discouragement: 
the product of two elements of P4 might not be an element of P3 (the de- 
gree might be too large). The example of triples with sum zero is perhaps 
the most discouraging: the attempt to define the product of two elements 
of V collapses almost before it is begun. Even if both a + 8 + y = 0 and 
a’ + B' +y = 0, it is a rare coincidence if also aa’ + 88' + yy = 0. It 
is best, at this stage, to resist the temptation to endow vector spaces with a 
multiplication. 

Both the scalar 0 and the vector 0 have to do with addition; how do 
they behave with respect to multiplication? It is possible that they misbe- 
have, in a sense something like the one for vector multiplication discussed 
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in the preceding paragraph, and it is possible that they are perfectly well 
behaved—which is true? 


Problem 20. Do the scalar zero law, 
0x = 0, 
and the vector zero law, 
a0 = 0, 


follow from the conditions in the definition of vector spaces, or could 
they be false? 


Comment. Note that in the scalar zero law the symbol 0 denotes a scalar 
on the left and a vector on the right; in the vector zero law it denotes a 
vector on both sides. 


21. Examples 


It is always important, in studying a mathematical structure, to be able to 
recognize an example as a proper one, and to recognize a pretender as one 
that fails in some respects. Here are a half dozen candidates that may be 
vector spaces or may be pretenders. 

(1) Let F be C, and let V also be the set C of complex numbers. Define 
addition in C the usual way, and let scalar multiplication (denoted by *) be 
defined as follows: 


a*z-—o.m. 


(2) Let F be a field, let V be F? (the set of all ordered pairs of elements 
of F), let addition in V be the usual one (coordinatewise), and define a new 
scalar multiplication by writing 


cx (8,7) = (aß, 0) 


(for all a, 8, and y). 
(3) Let F be the field of four elements discussed in Problem 19, let V 
be F? with the usual addition, and define scalar multiplication by 


a*(8,y)—(aB,oy) ify #0 
and 


ox (8,0) = (a8, 0). 
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(4) Let F be R and let V be the set R, of all positive real numbers. 
Define the “sum” denoted by a |+| 8 of any two positive real numbers a 
and £, and define the “scalar product" denoted by o [-] 8 of any positive 
real number a by an arbitrary (not necessarily positive) real number 8 as 
follows: 


al+] 8 — ag 


and 


a[:]8 = 8* 

(5) Let F be C, and let V be C also. Vector addition is to be defined 
as the ordinary addition of complex numbers, but the product of a scalar 
a (in C) and a vector x (in C) is to be defined by forming the real part of 
a first. That is: 

a- x= (Reo)z. 

(6) Let F be the field Q of rational numbers, let V be the field R of real 

numbers and define scalar multiplication by writing 


a* r= ar 


for all a in Q and all z in R. 


Problem 21. Which of the defining conditions of vector spaces are 
missing in the examples (1), (2), (3), (4), (5), and (6)? 


22. Linear combinations 


The best known example of a vector space is the space R? of all ordered 
pairs of real numbers, such as 


(1,1), 
(0, T’), 
La 
2 3 , (*) 
(0, —200), 
1 
di. -vs) 
e: 
An example of a vector space, different from R? but very near to it in spirit, 
consists not of all ordered pairs of real numbers, but only some of them. 
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That is, throw away most of the pairs in R?; typical among the ones to be 
kept are 


(0,0), 


Cn) 


(V5, -2V5), 
Ga) 


Are these four pairs in R? enough to indicate a pattern?—is it clear which 
pairs are to be thrown away and which are to be kept? The answer is: keep 
only the pairs in which the second entry is —2 times the first. Right? Indeed: 
0 = (—2)-0, and 1 = (—2)(— 2), and so on. Use R@ as a temporary symbol 
to denote this new vector space. 

Spaces such as R? and R2 are familiar from analytic geometry; R? is 
the Euclidean plane equipped with a Cartesian coordinate system, and R2 
is a line in that plane, the line with the equation 22 + y = 0. It is often 
good to use geometric language in linear algebra; it is comfortable and it 
suggests the right way to look at things. 

A vector was defined as an element of a vector space—any vector 
space. Caution: the word “vector” is therefore a relative word—it changes 
its meaning depending on which vector space is under study. (It's like the 
word "citizen", which changes its meaning depending on which nation is 
being talked about.) Vectors in the particular vector spaces R? and R2 hap- 
pen to be ordered pairs of real numbers, and the two real numbers that 
make up a vector are called its coordinates. Each of the five pairs in the 
list (*) is a vector in R? (but none of them belongs to R2), and each of the 
four pairs in the list (+*+) is a vector in R2. 

The most important aspect of vectors is not what they look like but 
what one can do with them, namely add them, and multiply them by scalars. 
More generally, if z = (a1, &2) and y = (81, 82) are vectors (either both 
in R? or else both in R2) and if £ and 7 are real numbers, then it is possible 
to form 


(**) 


£z + ny = (ai + nfi, a2 + 8»), 


which is a vector in the same space called a linear combination of the given 
vectors x and y, and that’s what a lot of the theory of vector spaces has to 
do with. Example: since 


3(4, 0) — 2(0, 5) = (12, —10), 
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the vector (12, —10) is a linear combination of the vectors (4,0) and (0, 5). 
Even easier example: the vector (7,7) is a linear combination of the vectors 
(1,0) and (0,1); indeed 


(7,7) = 7(1,0) + 7(0,1). 


This easy example has a very broad and completely obvious generalization: 
every vector (œ, 3) is a linear combination of (1,0) and (0,1). Proof: 


(a, 3) = a(1,0) + B(0, 1). 


Problem 22. Is (2,1) a linear combination of the vectors (1,1) 
and (1,2) in R?? Is (0,1)? More generally: which vectors in R? 
are linear combinations of (1,1) and (1,2)? 


Comment. Itis important to remember that 0 is a perfectly respectable scalar, 
so that, in particular (1,1) is a linear combination of (1, 1) and (1,2): 


and so is (0,0): 


(0,0) = 0- (1,1) - 0- (1,2). 


23. Subspaces 


The discussion of Problem 21 established that the four axioms that define 
vector spaces (the vector and scalar distributive laws, the associative law, and 
the scalar identity law) are independent. According to the official definition, 
therefore, a vector space is an abelian group V on which a field IF "acts" so 
that the four independent axioms are satisfied. The set P of all polynomials 
with, say, real coefficients, is an example of a real vector space, and so is 
the subset P} of all polynomials of degree less than or equal to 3. The set 
R? of all ordered triples of real numbers is a real vector space, and so is the 
subset V consisting of all ordered triples with sum zero. (See Problem 20.) 
The subset Q? of R? consisting of all triples with rational coordinates is not 
a vector space over R ( an irrational number times an element of Q? might 
not belong to Q? ), and the subset X of R? consisting of all triples with at 
least one coordinate equal to 0 is not a vector space (the sum of two elements 
of X does not necessarily belong to X). 

These examples illustrate and motivate an important definition: a non- 
empty subset M of a vector space V is a subspace of V if the sum of two 
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vectors in M is always in M and if the product of a vector in M with ev- 
ery scalar is always in M. An equivalent way of phrasing the definition is 
this: a non-empty set M is a subspace if and only if ax + 8y belongs to M 
whenever x and y are vectors in M and a and f are arbitrary scalars, or, 
in other words, subspaces are just the non-empty subsets closed under the 
formation of linear combinations. 

The set O consisting of the zero vector alone is a subspace of every 
vector space V (it is usually referred to as the trivial subspace—the oth- 
ers are called non-trivial), and so is the entire space V. The way the words 
"subset" and "subspace" are used is intended to allow these extremes. (A 
subspace of V different from V is called a proper subspace—in this lan- 
guage V itself is called the improper subspace.) To get more interesting 
examples of subspaces, it's a good idea to enlarge the stock of examples of 
vector spaces. 

It has already been noted (see Solution 20) that every field is a vector 
space over itself. In particular, R is a vector space over R, but, and this is 
more interesting, R is a vector space over Q also—just forget how to mul- 
tiply real numbers by anything except rational numbers. In this situation, 
where R is regarded as a rational vector space, the subset Q of R is a new 
example of a subspace, and so is the larger subset Q(4/2) (see Problem 16). 
In the same spirit, C (with the operation of addition) is a vector space over 
C, and it is also a vector space over R; from the latter point of view, the set 
R is a subspace (a real subspace of C). 

Usually when vector spaces are discussed a field F has been fixed once 
and for all, and it is clear that all vector spaces under consideration are over 
F. If, however, there is some chance that the underlying field may have to 
be changed during the discussion, then it is necessary to specify the field 
each time. One way to do that is to speak of an F vector space; this is the 
general form of speaking of rational, real, and complex vector spaces. 

If F is a field and n is a positive integer, then the set of all n-tuples 
(£1, 2, - .., Ên) of elements of F, is an F vector space (the addition of the 
n-tuples that play the role of vectors is coordinate by coordinate, and so is 
multiplication by an element of F); this space is denoted by F”. The set of 
those n-tuples whose first coordinate is equal to 0 is a subspace. 

Here is an important non-trivial example: the set of all real-valued 
functions defined on, say, a closed interval is a vector space over R if vec- 
tor addition and scalar multiplication are defined in the obvious pointwise 
fashion. The set of all continuous functions is an example of a subspace 
of that space. A different generalization of R” is the set of all infinite se- 
quences (£1, £2, £5... .], of real numbers; an example of a subspace is the 
subset consisting of all those sequences for which the series $5. , Ên is 


n=1 
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convergent, and a subspace of that subspace is the subset of all those se- 
quences for which the series is absolutely convergent. 

For examples with a more geometric flavor, consider the real vector 
space R? and in it the subset M of all vectors of the form (a, 2a), where a 
is an arbitrary real number. Equivalently: M consists of the vectors whose 
second coordinate is equal to twice the first; in the usual language of an- 
alytic geometry the elements of M are the points on the line through the 
origin with slope 2; the line described by the equation y = 2x. (Examples 
like this are the reason why linear algebra is called linear: the expression 
refers to the algebra of lines and their natural higher-dimensional gener- 
alizations.) The example is typical: every straight line through the origin is 
a subspace of R?, and every non-trivial proper subspace of R? is like that. 
The generalization of these examples to IR? is straightforward: the non- 
trivial proper subspaces of R? are the lines and planes through the origin. 

What's special about the origin? Answer: it necessarily belongs to ev- 
ery subspace. Proof: if M is a subspace, and if x is an arbitrary element of 
M, then 0- z belongs to M (scalar multiples), and since it is already known 
that 0 - x = 0, it follows that 0 € M for all M. The definition of subspaces 
could have been formulated this way: a subset M of V is a subspace if M 
itself is a vector space with respect to the same linear operations (vector 
addition and scalar multiplication, or, in one phrase, linear combination) 
as are given in V. Since every vector space contains its zero vector, the 
presence of 0 in M should not come as a surprise. 


Problem 23. (a) Consider the complex vector space C? and the 
subsets M of C? consisting of those vectors (œ, 3, y) for which 


(1) o — 0, 
(2) 8 — 0, 
(3)a- 8-21, 
(4) a 4 B — 0, 
(5) a - 8 2: 0, 
(6) a is real. 


In which of these cases is M a subspace of C?? 

(b) Consider the complex vector space P and the subsets M of 
all those vectors (polynomials) p for which 

(1) p has degree 3, 

(2) 2p(0) = »(1), 

(3) p(t) 2 0 whenever 0 € t € 1, 

(4) p(t) = p(1 — t) for all t. 
In which of these cases is M a subspace of P? 
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24. Unions of subspaces 


What set-theoretic operations on subspaces produce further examples of 
subspaces? One that surely does not is set-theoretic complementation: the 
vectors that do not belong to a specified subspace never form a subspace. 
To become convinced of that, think of a picture, in the plane for instance: 
the complement of a line is not a line. To give a brisk proof, just think of 0: 
the complement of a subspace never contains it. 


Problem 24. (a) Under what conditions is the set-theoretic inter- 
section of two subspaces a subspace? What about the intersection of 
more than two subspaces (perhaps even infinitely many)—when is 
that a subspace? 

(b) Under what conditions is the set-theoretic union of two sub- 
spaces a subspace? What about the union of more than two sub- 
spaces? 


25. Spans 


Do linear combinations of more than two vectors make sense? Sure. If, for 
instance, x, y, and z are three vectors in R?, or, for that matter, in R? or in 
IR? (see Problem 22) and if a, 8, and y are scalars, then the vector 


oz + By +yz 


is a linear combination of the set {x, y, z}. Linear combinations of sets of 
four vectors, such as (z, £2, 13, z4), are defined similarly as vectors of the 
form 


121 + 0222 + 0323 + A474 


(where a1, o», @3, and o4 are scalars, of course), and the same sort of 
definition is used for linear combinations of any finite set of vectors. Since 


az + By +yz —1: (ax + By) +y: z, 


itis clear that a linear combination of three vectors can be obtained in two 
steps by forming linear combinations of two vectors: the first step yields 
az + By and the second step forms a linear combination of that and z. The 
same thing is true in complete generality: every finite linear combination 
can be obtained in a finite number of steps by forming linear combinations 
of two vectors at a time. 

A vector space of interest is the set P; of all real polynomials p in one 
variable x, of degree less than or equal to 5 (an obvious relative of the 
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space P, considered in Problem 20). Examples of such polynomials are 


pr)-z-c z^, 
p(z) = —r + r’, 
p(z) =7 +r + (V2 - e")2?, 


p(x) = 7, 
p(z) = 0, 
p(z) = a*, 


and it is clear that to get more, richer, examples of vectors in P; “long” 
linear combinations of examples such as these need to be formed. 

Objects that naturally arise in this connection are the large sets of vec- 
tors that can be obtained from small sets by forming all possible linear com- 
binations. Problem 22, for instance, asked which vectors in R? are linear 
combinations of (1, 1) and (1, 2), and the answer turned out to be that every 
vector in R? is such a linear combination. A similar question is this: which 
vectors in IR? are linear combinations of (1, 1,0) and (1,2, 0)? The solu- 
tion of Problem 22 makes the answer to this question obvious: the answer 
is all vectors of the form (z, y, 0). The technical word for “set of all linear 
combinations” is span. So, for example, the span of the vectors (1, 1,0) and 
(1, 2, 0) is the set of all vectors of the form (£, n, 0), or, to say the same thing 
in different words, the set ((1, 1, 0), (1,2, 0)} spams the set of all vectors of 
the form (£, 7, 0). 

In geometrical language R? is 3-dimensional Euclidean space. In that 
space the set of all those vectors (£, n, C) for which  — ¢ = 0, or, in other 
words, the set of all (£, 0, 0), is called the £-axis, and, similarly, the 7-axis is 
the set of all (0, 7, 0), and the ¢-axis is the set of all (0, 0, ¢). These coordi- 
nate axes are lines. The coordinate planes are the (£, n)-plane, which is the 
set of all (£, n, C) with ¢ = 0, or, in other words, the set of all (£, n, 0), and, 
similarly, the (n, C)-plane, which is the set of all (0,7), C), and the (€, ¢)- 
plane, which is the set of all (£,0, C). In this language: the £-axis and the 
C-axis span the (£, C)-plane, and the set {(1, 1, 0), (1, 2, 0) spans the (£, 7)- 
plane. 

What is the span of the set ((1, 1, 1), (0,0, 0)}? Answer: it is the set of 
all vectors of the form (£, £, €), or, geometrically, it is the line through the 
origin that makes an angle of 45? with each of the three coordinate axes. 

How about this: does the vector (1,4,9) in R? belong to the span of 
1(1, 1, 1), (0, 1, 1), (0,0, 1))? The answer is probably not obvious, but it is 
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not difficult to get. If (1,4,9) did belong to the span of 
{(1, 1,1), (0, 1, 1), (0,0, 1)}, 
then scalars a, 8, and + could be found so that 
a(1,1,1) + 8(0,1,1) + y(0,0, 1) = (1,4, 9), 


and then it would follow that 


This in turn implies that 
a=1, 
B=4-a=4-1=3, 
y=9-a-B=9-1-3=5. 


Check: 1- (1,1,1)+3- (0,1,1) +5: (0,0,1) = (1, 4,9). 

Among the simplest of the polynomials (vectors) in the vector space IP; 
are 1, x”, and z*. What is their span? Answer: it is the set of all polynomials 
of the form 


a 4 Bx? + y24. 


P 
These polynomials happen to have a pleasant property that characterizes them: 
the replacement of + by —z does not change them. Polynomials with this 
property are called even. Symbolically said: a polynomial p is even if it 
satisfies the identity p(—x) = p(x). A polynomial p is called odd if it sat- 
isfies the identity p(—x) = —p(x). What do the odd polynomials in P; 
look like? 


Problem 25. (a) Can two disjoint subsets of R?, each containing 
two vectors, have the same span? (b) What is the span in R? of 


{(1,1,1), (0,1, 1), (0,0,1))? 


26. Equalities of spans 


Span is a set-theoretic operation that converts sets (of vectors) into other 
sets (subspaces). (In other words, “span” applies to sets of vectors, not to 
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vectors themselves, and an expression such as “the span of the vectors x 
and y" is not really a proper one.) What can be said about the relation 
between a set of vectors and its span? There are three easy statements on 
the most abstract (and therefore most shallow) level, namely that 


(1) every set is a subset of its span, 

(2) ifa set E is a subset of a set F, then the span of E is a subset of the span 
of FF, 

and 

(3) the span of the span of a set is the same as the span of the set. 


It is often convenient to have a symbol to denote “span”, and one possi- 
ble symbol is V (which is intended to be reminiscent of the ordinary set- 
theoretic symbol for union). In terms of that symbol the statements just 
made can be expressed as follows: 


Ec VE, (1) 


if ECF, then VEc VF, (2) 


and 


VVE=VE o 


In technical language (which is not especially useful here) (1) says that the 
span operation is increasing, (2) says that it is monotone, and (3) says that 
it is idempotent. 

Knowledge about the span of a set of vectors provides geometric in- 
sight about the set, and, for instance, the knowledge that two sets have the 
same span (compare Problem 25 (a)) provides geometric insight about the 
relations between them. Here is an example of the kind of question about 
some spans that might arise: if we know about three vectors z, y, and z 
that x € \/{y, z}, are we allowed to infer that V{z, z} = V{y, z}? The 
answer is no. If, for instance, z is a scalar multiple of z but y is not, then x 
obviously belongs to V (x, z}, but y does not. 

A related question is this: if M is a subspace and z and y are vectors 
such that 


rc VM, y) 
does it follow that 
VM, z} = VM, y}? 


(Here (M, z} is an abbreviation for MU (z).) The answer is trivially no: it 
could, for instance, happen that M is the subspace spanned by z, in which 
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case the assumption is obviously true, but the questioned conclusion can 
be true only if y belongs to that subspace, which it may fail to do. Is that 
the only thing that can go wrong? 


Problem 26. Jf x and y are vectors and M is a subspace such that 
x ¢ M but x € V{M, y}, does it follow that 


VIM, z} = VM, y}? 


27. Some special spans 


If z is a vector in V, what is the intersection of all the subspaces of V that 
contain x? (Caution: are there any?) In view of Problem 24, one thing is for 
sure: that intersection, call it M, is a subspace. Since, moreover, M is the 
intersection of sets (subspaces) each of which contains z, it too contains z. 
What else does it have to contain? Answer: since a subspace containing x 
must contain all scalar multiples of z, it follows that cz is in M for every a. 
The set Mo of all scalar multiples of z is itself a subspace, and the preceding 
sentence says exactly that Mo C M. Since, moreover, the subspace Mo 
contains z, so that it is a member of the collection that was intersected to 
form M, it follows that M C Mo. Consequence: M = Mo. 

The same argument can be applied to two vectors as easily as to one. 
If z and y are in V, there surely exist subspaces that contain them both 
(V is one), and the intersection of all those subspaces is a subspace M that 
contains both. Being a subspace, it contains all linear combinations ax+ By 
also. The set Mp of all such linear combinations is itself a subspace, the one 
that was called the span of (z, y) in Problem 25, and, obviously Mo C M. 
Argue as above and conclude that M = Mo. 

These examples are special cases of a general concept that applies to 
arbitrary subsets of a vector space. If E C V (the set E can be a singleton 
{x}, a pair (z, y}, or, for that matter, an arbitrary finite or infinite set), the 
intersection, call it M, of all subspaces that include E is a subspace. (Recall 
that there always exists at least one subspace that includes E, namely V.) 
The argument given in the preceding paragraphs can be given again and it 
proves that M = V E. 

Is the last sentence correct? There is a curious degenerate case to be 
considered: what is the span of the empty set of vectors? In view of the def- 
inition, the question is this: which vectors can be obtained as linear combi- 
nations of no vectors at all? This formulation calls attention to a blemish 
of the definition; it doesn't apply to the conceptually trivial but technically 
very important empty set. The cure is to rephrase the definition: the span of 
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a set of vectors is the intersection of all the subspaces that include the set. 
It is a non-profound exercise to show that for non-empty sets the rephras- 
ing is equivalent to the original definition; its virtue is that it applies with 
no change to every set—including, in particular, the empty set. Since every 
subspace includes the empty set, and, in particular, the trivial subspace O 
includes the empty set, it follows that the span of the empty set is O. The 
result is worth mentioning, if only to show that the concept of span works 
smoothly in all cases, with no troublesome exceptions. 

The span of a set (finite or infinite) consists, by definition, of the set of 
all linear combinations of elements of the set. There is a way of saying the 
same thing that uses exactly one more word and that word seems to come 
naturally to some people: they say that the span of a set consists of the set 
of all finite linear combinations of elements of the set. The added word 
“finite” is harmless, in the sense that it doesn’t change the meaning of the 
sentence (no other kind of linear combination has been defined), but at 
the same time it might be harmful because it suggests that “infinite linear 
combinations” could have been considered but were deliberately excluded. 
That is not true, and that way confusion lies. 

To get acquainted with the notion of span it is a good idea to look at a 
handful of special cases, various more or less randomly selected (small or 
large) subsets of R? or R? or P, and examine their spans. 


Problem 27. (a) Is there a vector that spans R2? (b) Are there two 
vectors that span R?? (c) Are there two vectors that span R*? (d) Is 
there any finite set of vectors that spans P? 


28. Sums of subspaces 


Which vectors in R? can be obtained from the two subspaces (lines) M 
and N, where M is the line through the origin with slope 2 and N is the 
line through the origin with slope 3, by adding a vector in M to a vector 
in N? The vectors in M are those of the form (o, 2a) and the vectors in N 
are those of the form (a, 3a). The question is: which vectors can be repre- 
sented in the form (a + 8, 2a + 38) as a and £ are allowed to vary over all 
real numbers? The answer is easy enough to figure out (all vectors in R?), 
but there is a general concept here, waiting to be recognized, that’s much 
more useful than the special answer. 

If M and N are subspaces of a vector space V, which vectors can be 
obtained by adding a vector in M to a vector in N? That is: choose x in M 
and y in N, form z + y, and ask which vectors can be so represented as x 
and y vary over all vectors in M and N respectively. Whatever the answer, 
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the symbol for it is 
M +N. 


This is a new operation on subspaces, a kind of addition. Its main use comes 
from its relation to spans. 

Since 0 belongs to every subspace, it follows that both M and N are 
included in M + N, and hence that MUN C M +N. Since M+ Nisa 
subspace (that is easy to check), it follows that 


\V (MUN) CM4N. 


(Right? If a subspace includes a set E, then it includes V E.) But both M 
and N are included in V(M U N), and V(M UN) is closed under vector 
addition. It follows that M + N C V(M U N), and therefore, finally, that 


V(MUN) 2 M-«N. 


Summary: to form the span of two subspaces is the same as forming their 
sum. 

Addition of subspaces is a curious operation. It is commutative and 
associative—that’s easy. It has an identity, namely the trivial subspace O. 
It does not, however, make the set of subspaces of a vector space into a 
group—inverses do not exist. Indeed, since M C M + N whenever M and 
N are subspaces, it follows that M + N = O is out of the question unless 
M = N = Q. Note also that M + M = M—not the sort of behavior that 
groups permit. A related unorthodox property of subspace addition is that 
M +N = N can happen quite easily even when M 4 QO. (Under what 
conditions does it happen? Answer: if and only if M C N.) 

Subspaces have a kind of multiplicative structure too, namely inter- 
section. Intersection is commutative and associative—that's easy—and it 
has an identity, namely the improper subspace V. That's as far as the good 
properties go. Inverses do not exist. Indeed: M > MNN whenever M and N 
are subspaces, so that MNN = V is out of the question unless M = N = V. 
Note also that M N M = M, and that M N N = N can happen quite easily, 
namely just when N C M. 

Related to the additive and multiplicative structure of subspaces and 
symmetrically connected with both there is a geometrically important pos- 
sibility that has some of the properties of set-theoretic complementation. 
It's best to begin its study by looking at some examples. 

Consider two distinct non-trivial proper subspaces M and N of R? (or, 
if geometric language is preferred, consider two distinct lines in the or- 
dinary Euclidean plane). It cannot be true that M + N = O (that is, M 
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and N are not additive inverses of one another), and it cannot be true that 
MIN = R? (that is, M and N are not multiplicative inverses of one another). 
The extreme opposite is true: M + N = R? and MN N = Q. (Look at the 
picture.) 

Another example. Let M be the set of all even polynomials (with, say, 
real coefficients) and N the set of all odd ones; the definitions of these terms 
appear in Problem 25. Can a polynomial be both even and odd? A moment’s 
thought should reveal the answer: if and only if it is identically zero. Note 
that it follows that both M and N are subspaces of the vector space P of 
all polynomials. (Caution: if the underlying field is such that 1 + 1 = 0, the 
two definitions of evenness and oddness are not equivalent. Over the field of 
integers modulo 2, for instance, every polynomial is both even and odd in the 
second sense.) 

Can every polynomial be written as a sum of an even one and an odd 
one? Sure: given p, define q and r by 


1 1 
a(z) = 5(p()-p(-2)) ^ and — r(z)— 5(p(z) - »(-2)), 
and verify that q is even, r is odd, and p — q 4- r. Conclusion: 


MNN=0 and M+N=P. 


In a general vector space V two subspaces M and N are called 
complements of one another if 


MNN=0 and M+N=V. 


The concept is illustrated by the examples: every line in the plane is a com- 
plement of every other line (no hope of uniqueness), and the subspaces of 
even and odd polynomials are complements in P. A trivial example can be 
given in any V, namely the trivial subspace © and the improper subspace V. 
Does every subspace in every vector space have at least one complement? 
Does every non-trivial proper subspace in every vector space have many com- 
plements? (What does “many” mean? It is a relative notion; it depends on 
the coefficient field. For finite fields “many” might just mean “more than 
one”.) The answers are yes both times, but the proofs depend on set-theoretic 
techniques (such as Zorn’s lemma) foreign to the spirit of introductory linear 
algebra. For the vector spaces that will presently start occupying the center 
of the stage the answer will be obtained by more easily accessible methods. 
The set of subspaces with addition (+) and intersection (N) misses 
being a field because neither operation admits inverses. What about the 
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connection between the two operations: how well behaved is it? A frontal 
attack on the question would try to prove or disprove the distributive law. 
It is advisable to approach the question more modestly by asking about 
easier algebraic properties of subspace addition. One well-known possible 
property is called the modular identity; that’s what this problem is about. 


Problem 28. Is it true that if. L, M, and N are subspaces of a vector 
space, then 


LN (M+ (LNN)) = (LAM) + (Ln N)? 


29. Distributive subspaces 


Problem 29. For which vector spaces V is it true that if L, M, and 
N are subspaces of V, then 


LN (M +N) 2 (LAM) 4 (LnN)? 


N 


30. Total sets 


Is there a sub[set] E of a vector space V such that the only sub[space] of V 
that includes E is V itself? Sure: several such examples have already been 
seen. An example in R! is the singleton {z} of any non-zero z; an example 
in R? is E = ((1,0), (0, 1)}. 

If the only subspace of V that includes E is V, then, of course, the 
intersection of all the subspaces that include E is just V, so that 


VE- V. 


A set E with this property, a set whose span is the entire vector space, is 
called a total set. By a slight extension of the language, a set E that spans 
a subspace M of V is called total for M . In the vector space P» of all poly- 
nomials of degree less than or equal to 2 the set 


E={1,l+2,l+2+27} 


is a total set, and in the larger vector space P of all polynomials the infinite 
set 


E = {1,2,27,2°,...} 


of monomials is a total set. For the space O, the empty set is total. 
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The good vector spaces in linear algebra, the easiest ones to work with 
and the ones that the subject is rooted in, are the ones that have a finite 
total set; vector spaces like that are called finite-dimensional. The space 
P» is finite-dimensional, but (see Problem 27) the space P is not. 

The first natural question about finite-dimensional vector spaces 
sounds deceptively simple: is every subspace of a finite-dimensional vector 
space finite-dimensional? That sounds like asking whether every subset of 
a finite set is finite, but it is not. The question is surprisingly delicate; it is 
not the kind for which all that's necessary is to feed the definitions into a 
machine and turn the crank. Here is a step toward acquiring the necessary 


insight. 


Problem 30. Jf E is a total subset of a vector space V, and if M is 
a subspace of V, does it follow that some subset of E is total for M? 


31. Dependence 


The three vectors 


z—(10,  y-(01, and z-(11) 


form a total set for R?; in fact the first two are enough and the third one 
is superfluous. There is a simple doctrine at work here: adjoining extra 
vectors to a total set leaves it total. The new vectors do no harm, but they 
give no new information. 

The vector z in this example is the sum of x and y, and that makes 
it obvious that every linear combination of z, y, and z is already a linear 
combination of z and y. The presence of superfluous vectors in a total set is 
not always so clearly visible. For a look at a more hidden kind of superfluity, 
let z, y, and z this time be defined by 


x= (1,7), y — (2,8), and z = (3,6). 


It doesn't jump to the eye that z, y, and z form a total set, but they do—and 
it doesn't jump to the eye that one of z, y, and z is superfluous, but it is 
true. One way to become convinced of totality is to verify that 


7 7 1 7 2 
ILE -z= (l — -y — — zZz = z 
z au + sg (1, 0) and 7t tg 31^ (0, 1) 
Once that is granted, totality does jump to the eye: since every vector in 
R? is a linear combination of (1,0) and (0, 1), it follows that every vector 


in R? is a linear combination of z, y, and z. As for superfluity: since 


4z — 5y + 2z = 0, 
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it follows that z is “superfluous” in the sense that z is a linear combination 
of x and y (z — By — 2x). Similarly, of course, x is also superfluous, and so 
is y. If any one of x, y, and z is omitted from (z, y, z), what's left is still a 
total set. 

The way zx, y, and z in these examples depend on one another (“depend” 
is the crucial word) is an instance of the basic general concept called depen- 
dence. No matter what z, y, and z are, it is always possible to find scalars 
a, B, and y so that the linear combination az + By + yz becomes 0; just 
choose a = 0, 8 = 0, and y = 0. That's a trivial statement, and the linear 
combination 0:2 4-0- y --0- z is justly called the trivial linear combination. The 
same language is used for any finite number of vectors. With that settled, the 
ground is prepared for the appropriate general definition: a finite set of vec- 
tors is called dependent (usually the longer expression linearly dependent 
is used) if some non-trivial linear combination of them vanishes. If the set is 
[z1,..., £n}, then dependence means that there exist scalars a1,...,Qn, not 
all zero, such that 


012; T: Oz, = 0. 


Example: no matter what vector z is, the set (0, x) is dependent. Reason: 
1:0--0. x = 0. (Note: the scalar coefficients are 1 and 0, and they are not all 
zero.) A trivial example of a dependent set is the set consisting of the vector 
0 alone. Reason: 1-0 = 0. Here is a more nearly typical example: if x and y 
are arbitrary vectors, the set {x,y,z + y) is dependent. Reason: 


l:r-c-1:y-c(-1): (zy) — 0. 


Still another: if z, y, u, and v are arbitrary vectors, then the set 


{x£ y, £ +y, u,v} 


is dependent. Reason: 


l-e+l-yt+(-1):(e+y)4+0-u+0-v=0. 


This last example illustrates that in at least one respect dependence behaves 
the way totality does: adjoining extra vectors doesn’t change the property. A 
set larger than a dependent set is still dependent. 

Here is a final easy example of dependence in the concrete vector space 
R!: if x and y are any two vectors in that space (that is any two real numbers), 
then the set {x,y} is dependent. Reason: if both x and y are 0, the assertion 
is trivial; if at least one of them is different from 0, then yz + (—z)y is a 
non-trivial linear combination that vanishes. 
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The concept of dependence was introduced via a discussion of “su- 
perfluous” vectors in total sets. Does the same connection between de- 
pendence and superfluity hold in general? 


Problem 31. [fa vector zo is a linear combination of {21,..., tn}; 
does it follow that the set (19,21,...,24) is dependent? If, con- 
versely, a finite set {£0, £1, . -- , &n} of vectors is dependent, does it 
follow that at least one of them is a linear combination of the others? 


32. Independence 


If x is a non-zero vector, then the only linear combination of the set {x} 
that can vanish is the trivial one—or, in plain English, the only time ax can 
be 0 is when a = 0. In other words, with one exception the singleton {x} 
is not dependent. 

In the vector space R? if z — (1,0) and y — (0, 1), then the only linear 
combination of the set (z, y) that can vanish is the trivial one—that is, the 
only time ax + Gy can be zero is when a = £ = 0. In other words, the pair 
{x,y} is not dependent. 

General definition: a set that is not dependent is called independent 
(usually linearly independent). 

Dependence and independence are properties of sets (of vectors), but 
most people find it comfortable to speak a little loosely and apply the ad- 
jectives to vectors. Instead of speaking of an “independent set" [of vectors], 
they speak of [a set of] “independent vectors". The slightly less sharp usage 
isn't really dangerous. 

It is often convenient to extend the use of the language to two extreme 
cases, very large sets and very small sets. Very large: infinite. Very small: 
empty. 

An infinite set is called independent if every finite subset of it is in- 
dependent. Example: the monomials 1, z, x”, x°, .. . form an infinite inde- 
pendent set in P. Reason: the only time a linear combination of powers is 
the zero polynomial is when every coefficient is zero. (This, by the way, is 
not a statement about the algebra of polynomials: it is merely a reminder 
of what "zero polynomial" means in contexts such as this.) 

Is the empty set dependent or independent? The question is not in- 
trinsically important, but it would be inconvenient to proceed without ex- 
amining it. The point is that the empty set is quite likely to occur in the 
middle of a deduction (when, for instance, the intersection of two sets has 
to be formed), and it would be awkward to have to keep making case dis- 
tinctions. 
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The best way (the only convincing way?) to answer questions about the 
empty set (questions of “vacuous implication”) is to ask how they can be 
false. The present question is this: if a linear combination of the empty set is 
0, does it follow that every coefficient must be 0? How could that be false? 
It is false only if some non-trivial linear combination of the empty set turns 
out to have the value 0. To say that a linear combination is non-trivial means 
that it has at least one coefficient different from 0—and that cannot happen. 
Reason: there are no coefficients at all, and hence, in particular, there is no 
coefficient different from 0. Conclusion: the assertion that the empty set is 
independent cannot be false. A consistent use of language demands that the 
empty set be declared independent. Note, by the way, that this conclusion 
is in harmony with the assertion that every subset of an independent set is 
independent. 

“Independent” is the accepted dignified way to say of a set of vectors 
that it has no “superfluous” elements. An independent total set in a vector 
space V is called a basis of V. Examples: if z is any real number different 
from 0, then {x} is a basis for R!; if 


x = (1,0) and y = (0,1), 
then (z, y) is a basis for R, and so is {x — y,x + y]; the monomials 


1,2,2?,22,... 


2 4 


constitute a basis for P; the monomials 1, x, z?, x3, x4, x? constitute a basis 
for Ps. 

Does the vector space O (consisting of the vector 0 only) have a basis? 
Since the only possible element in a basis for O is 0, and since the set (0) is 
dependent, it looks as if the answer must be no—but that's wrong. The answer 
is yes; the space O does have a basis, namely the empty set. Indeed: the empty 
set is independent and its span contains 0. This sort of thing looks strange on 
first encounter, but it's easy to get used to, and it works smoothly—there is 
nothing wrong, either logically or linguistically. 

Do vector spaces always have bases? That's a surprisingly difficult ques- 
tion; the techniques for the general answer are necessarily transfinite. For 
finite-dimensional vector spaces, however, the tools already available are ad- 


equate. 


Problem 32. Does every finite-dimensional vector space have a 
finite basis? 


CHAPTER 3 


BASES 


33. Exchanging bases 


The most useful questions about total sets, and, in particular, about bases, 
are not so much how to make them, but how to change them. Which vectors 
can be used to replace some element of a prescribed total set and have it 
remain total? Which sets of vectors can be used to replace some subset of 
a prescribed total set and have it remain total? What restriction is imposed 
by the relation between the prescribed set and the prescribed total set? 


Problem 33. Under what conditions on a total set T of a vector 
space V and a finite subset E of V does there exist a subset F of T 
such that (T — F) U E is total for V? 


Does that sound awkward? In less stilted language the question is this: 
under what conditions can one replace a part of a total set by a prescribed 
set without ruining totality? 


Comment. The way the problem is stated the answer is “always”: just take 
F = ø. Consequence: it is necessary to think about the problem before 
beginning to solve it. Under what conditions on T and E and F does the 
question make good sense? 


34. Simultaneous complements 


If M is a subspace of a vector space V, a complement of M was defined in 
Problem 28 as a subspace N of V such that MN N = {0} and M +N = 
V. (Recall that M + N denotes the set of all vectors of the form z + y 
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with z € M and y € N, or, which for subspaces comes to the same thing, 
it denotes the span of the set M UN.) It is easy for a subspace to have 
more than one complement, or, to put the same thing another way, it is 
easy for several subspaces to have a "simultaneous" complement, meaning 
a complement in common. It's easy enough, but that doesn't mean that 
it always happens. Sample question (which will cause even the experts to 
think for a nanosecond): if two subspaces are complements, can they have 
a simultaneous complement? Must they always have one? 


Problem 34. Under what conditions does a finite collection of sub- 
spaces ofa finite-dimensional vector space have a simultaneous com- 
plement? 


35. Examples of independence 


Linear independence is one of the most important concepts of linear alge- 
bra. A good way to acquire it in one's bloodstream is to look at many ex- 
amples, and this problem, and several of the ones that follow, are intended 
to provide some practice in the use of the concept. Most such problems re- 
quire very little thought—just a little work will solve them. 


Problem 35. (a) For which real numbers x is it true that the vectors 
xand 1 are linearly independent in the vector space R of real numbers 
(over the field Q of rational numbers)? 

(b) Under what conditions on the scalar € are the vectors 
(1 + £& 1 — €) and (1 — £,1+ £) in R? (over the field Q of rational 
numbers) linearly independent? 


36. Independence over R and Q 


Problem 36. Zs there a subset of R? that is independent over Q but 
dependent over IR? 


37. Independence in C? 


Problem 37. (a) Under what conditions on the scalars a and B are 
the vectors (1, a) and (1, 8) in C? linearly independent? 
(b) Is there a set of three linearly independent vectors in C?? 
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38. Vectors common to different bases 


Problem 38. (a) Do there exist two bases in C^ such that the only 
vectors common to them are (0,0, 1, 1) and (1, 1,0,0)? 

(b) Do there exist two bases in C* that have no vectors in com- 
mon so that one of them contains the vectors (1, 0, 0, 0) and (1, 1, 0,0) 
and the other one contains the vectors (1,1, 1,0) and (1,1,1,1)? 


39. Bases in C? 


Problem 39. (a) Under what conditions on the scalar x do the vec- 
tors (1,1, 1) and (1, x, x”) form a basis of C3? 

(b) Under what conditions on the scalar x do the vectors (0, 1, £), 
(x, 0, 1), and (z, 1,1 + z) form a basis of C?? 


40. Maximal independent sets 


Problem 40. /f X is the set consisting of the six vectors in R$, 


(1, 1, 0, 0), (1,0, 1,0), (1,0,0, 1), 
(0, 1, 1,0), (0, 1, 0, 1), (0,0, 1, 1), 


do there exist two different maximal linearly independent subsets of 
X? 


(A maximal linearly independent subset of X is a subset Y of X that 
becomes linearly dependent every time that a vector of X that is not already 
in Y is adjoined to Y.) 


41. Complex as real 


A vector space is not only a set of vectors; it is a set of vectors together 
with a coefficient field that acts on it. It follows that one and the same set 
of vectors can well be a vector space in several different ways, depending on 
what scalars are admitted. So, for instance, the set C of complex numbers 
is a vector space over the field C, but that's not especially thrilling; what is 
more interesting is that C is a vector space over the field R of real numbers: 
just forget that multiplication by non-real scalars is possible. The following 
question is a generalization of the one just hinted at. 
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Problem 41. Every complex vector space V is intimately associated 
with a real vector space V***!; the space V '**! is obtained from V by 
refusing to multiply vectors in V by anything other than real scalars. 
If the dimension of the complex vector space V is n, what is the di- 
mension of the real vector space V***!? 


42. Subspaces of full dimension 


Problem 42. Can a proper subspace of a finite-dimensional vector 
space have the same dimension as the whole space? 


43. Extended bases 


Which vectors are fit to belong to a basis of a vector space? The vector 0 is 
not; is that the only exception? Which sets of vectors are fit to be subsets 
of a basis? A dependent set is not; is that the only exception? 

Consider a special example. The vectors 


zı = (1,0,0,0), 
z2 = (0,1,0,0), 
z3 = (0,0,1,0), 
x4 = (0,0,0,1), 


form a basis for C*—that’s easy. Suppose, however, that for some applica- 
tion a different basis is needed, one that contains the vectors 


u = (1,1,1,1) 
and 
v = (1,2,3,4). 


Is there such a basis? What is easy to check is that u and v are independent, 
so that they might be fit to be part of a basis, but the question is whether 
independence by itself is a sufficient condition. 


Problem 43. Can every (finite) independent set in a finite- 
dimensional vector space be extended to a basis? 
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44. Finite-dimensional subspaces 


In Problem 30 the question arose whether every subspace of a finite- di- 
mensional vector space is finite-dimensional, but it was not answered there. 
Now that the technique of making independent sets larger is at hand, that 
question can be raised with more profit. 


Problem 44. Is every subspace of a finite-dimensional vector space 
finite-dimensional? 


45. Minimal total sets 


Total sets are "large" in some rough sense, and, in particular, it is obviously 
true that if a set is total, then any larger set is necessarily total also. This 
obvious remark calls attention to the fact that sometimes it is possible to 
omit some of the elements of a total set and still end up with a total set. 
When that is not possible, it is natural to call the total set minimal. Some 
total sets are minimal, and some (for example bases) are independent. Is 
there an implication relation between these two possible properties of total 
sets, either way? 


Problem 45. Is every minimal total set independent? Is every inde- 
pendent total set minimal? 


46. Existence of minimal total sets 


Minimal total sets exist all right (any basis is one), but how easy are they 
to come by in prescribed contexts? 


Problem 46. Does every total set have a minimal total subset? 


47. Infinitely total sets 


Do there exist total sets that remain total when any one of their elements 
is discarded, but cease being total if an appropriately chosen set of two 
elements is discarded? Caution: the question is about sets not sequences; 
duplication of elements is not appropriate in this context. If, for instance, 
V is a 2-dimensional vector space, and (z, y) is a basis for V, then the 
"set" (z, z, y, y) is not an acceptable answer. A small modification of this 
unacceptable construction does, however, yield an answer, namely the set 
(2, 2x, y, 2y} (provided that the underlying scalar field does not have char- 
acteristic 2). An obvious extension of the technique gives for each positive 
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integer n a total set that remains total when any n of its elements are dis- 
carded, but ceases being total if an appropriately chosen set of n + 1 el- 
ements is discarded. It is even possible for a set to be infinitely total in 
the sense that it remains total when any finite subset of it is discarded. A 
specific example for a 2-dimensional vector space with basis {x, y) over R, 
say, is the set 


{zx, y, 2x, 2y, 3x, 3y,...}. 


In this example even infinite omissions can be allowed, if they are carefully 
made. If every term after the second one is omitted (that’s being very care- 
ful), the result is still total, but if all the terms in even positions are omitted 
(not careful enough), the remainder is not total. Is this a very special case, 
or are its properties shared by all infinitely total sets? 


Problem 47. Does every infinitely total set E have an infinite subset 
F such that the relative complement E — F is total? 


48. Relatively independent sets 


Every set of n + 1 vectors in an n-dimensional vector space is dependent. 
It is, however, trivial to find three vectors in R? such that no two of them 
are dependent (or, in geometric language, such that no two of them are 
collinear with the origin), and it is equally trivial to find four vectors in IR? 
such that no three of them are coplanar with the origin, and, generally, it 
is easy to find n + 1 vectors in R” such that every n of them constitute 
an independent set. It is temporarily convenient to call a subset E of R" 
with this property relatively independent, the property being that every n 
vectors in E are independent. A relatively independent set in R” can have 
n + 1 vectors; can the number n + 1 be improved? 


Problem 48. What is the largest possible number of vectors in a 
relatively independent subset of R”? 


49. Number of bases in a finite vector space 


Properties of the coefficient field of a vector space obviously have an effect 
on the linear algebraic properties of the space. Finite fields are especially 
important in some applications, and the subject as a whole is not properly 
understood without at least a little insight into how they work. 

The best known examples of finite fields are the ones of the form Zp, 
that is, the integers modulo p, where p is a prime. These examples are not 
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the only ones; see Problem 19. Granted that they exist, linear algebra can 
be used to prove a little theorem about them, namely that the number 
of elements is always a power of a prime. The necessary tools from field 
theory are these: a finite field must have prime characteristic; every field of 
characteristic p has a subfield isomorphic to Z,; every field is a vector space 
over any subfield. (These statements belong to the part of field theory that 
is right next to the definitions; they are easy to prove.) An additional tool 
that is needed is from linear algebra, and it will be discussed later; the proof 
of the “little theorem” will be postponed till then. 

It is true that if q is a power of a prime, then there does indeed exist 
a field with q elements (and, to within a change of notation, there is only 
one such field); the case q = 2? discussed in Problem 19 is more or less 
typical. A typical vector space over a field F with q (= p*) elements, is F^. 
How many vectors does that vector space contain? The answer is q” and 
that's easy. The following considerably trickier counting question asks not 
for the number of elements but for the number of subsets of a certain kind. 


Problem 49. f F is a field with q elements, how many bases are 
there in F^? 


50. Direct sums 


The Euclidean plane can be viewed as the result of a construction that 
starts from two lines (the z-axis and the y-axis) and puts them together to 
form a new vector space. That construction is an instance of a general one; 
other instances of it occur throughout linear algebra (or, for that matter, 
throughout mathematics). If U and V are vector spaces, then their direct 
sum, denoted by 


Uey 
is the set of all ordered pairs (zx, y), with z in U and y in V, and with vector 
addition and scalar multiplication defined by the natural equations 
(21,91) + (22,92) = (Z1 + 22:91 + 2) 


and 


a(z, y) = (az, ay). 


The vectors z (in U) and y (in V) are called the coordinates of (x, y) (in 
U @ V). In that language, the definitions just described can be expressed 
by saying that the linear operations in U @ V are defined coordinatewise. It 
must, of course, be checked that the definitions are correct, meaning that 
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they do indeed define a vector space. That check is painless, and requires 
no new techniques; it’s no different from the proof that if F is a field, then 
F? is a vector space. That familiar assertion is, in fact, a special case of what 
is now being asserted; in the present language F? is the direct sum of IF! 
and F!. 

If U and V are well-behaved vector spaces, how well-behaved is UY? 


Problem 50. Jf U and V are finite-dimensional vector spaces, of 
dimensions n and m respectively, what is the dimension of U ® V? 


Reminder The dimension of a (finite-dimensional) vector space was 
defined in Solution 33 as the number of elements in a basis of V immedi- 
ately after the statement that all bases have the same number of elements. 


51. Quotient spaces 


If M is a subspace of a vector space V, then there are, usually, many sub- 
spaces N such that 


MnN-20O 
and 
M 4- N — V, 


or, in other words, M can have many complements, and there is no nat- 
ural way of choosing one from among them. There is, however, a natural 
construction that associates with M and V a new vector space that plays, 
for all practical purposes, the role of a complement of M. The theoreti- 
cal advantage that the construction has over the formation of an arbitrary 
complement inside V is precisely its “natural” character, that is, it does not 
depend on choosing a basis, or, for that matter, on choosing anything at all. 

To understand the construction it is a good idea to keep a picture in 
mind. Suppose, for instance, that V = R? and that M consists of all those 
vectors (21,22) for which z2 = 0 (the horizontal axis). Each complement 
of M is a line (other than the horizontal axis) through the origin. Observe 
that each such complement has the property that it intersects every hor- 
izontal line in exactly one point. The idea of the construction to be de- 
scribed now is to make a vector space out of the set of all horizontal lines. 

Begin (back in the general case) by using M to single out certain sub- 
sets of V. If z is an arbitrary vector in V, the set z + M consisting of all the 
vectors of the form x+y with y in M is called a coset of M, and sets like that 
are the ones that are of interest now. As for the notation: it is consistent 
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with that used before for vector sums (see Problem 28). In the case of the 
plane-line example, the cosets are the horizontal lines. Note that one and 
the same coset can arise from many different vectors: it is quite possible 
that z + M = y+ M even when z # y. It makes good sense, just the same, 
to speak of a coset, say K, of M, without specifying which element (or el- 
ements) K comes from; to say that K is a coset (of M) means simply that 
there is at least one z such that K = z + M. 

If H and K are cosets (of M), the vector sum H + K is also a coset of 
M. Indeed, if 


H=z+M 
and 
K=y+M, 


then every element of H + K belongs to the coset (x + y) + M (note that 
M + M = M), and, conversely, every element of (z + y) + M is in H + K. 
(If, for instance, z is in M, then (2 +y) + z = (z + z) + (y + 0).) In other 
words, 


H +K = (z +y) +M, 


so that H + K is a coset, as asserted. 
It is easy to verify that coset addition is commutative and associative. 
The coset M (that is, 0 + M) is such that 


K+M=K 


for every coset K, and, moreover, M is the only coset with this property. (If 
(z 4- M) -- (y +M) = z-- M, then z- M contains r+y, so that r+y = z--u 
for some u in M; this implies that y is in M, and hence that y + M = M.) 
If K is a coset, then the set consisting of all the vectors —u, with u in K, is 
itself a coset, which is denoted by 


=K. 


The coset —K is such that K + (—K) = M, and, moreover, —K is the only 
coset with this property. To sum up: with the operation of vector sum, the 
cosets of M form an abelian group. 

If K is a coset and if o is a non-zero scalar, write a - K for the set 
consisting of all the vectors au with u in K; the coset 0 - K is defined to be 
M. A simple verification shows that with scalar multiplication so defined 
the cosets of M form a vector space. This vector space is called the quotient 
space of V modulo M; it is denoted by V/M. 
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The quotient space V/M could have been defined differently. Accord- 
ing to an alternative definition, the elements of V/M are the same as the 
elements (vectors) of V, but the concept of equality is redefined: two vec- 
tors are to be regarded as the same if they differ from one another only bya 
vector in M. In other words, if z and y are vectors in V, say that z = y mod- 
ulo M, best pronounced as “z is congruent to y modulo M", and perhaps 
more honestly written as 


zzy (mod M) 


when z—y € M. This alternative formulation is intended to be reminiscent 
of the discussion of polynomials modulo (the multiples of) a fixed polyno- 
mial (see Solution 19). There are two approaches to the study of “quo- 
tients": in one the new elements are sets, with the necessary operations, 
such as addition, suitably defined, and in the other the new elements are 
the same as the old ones, and so are the operations, but equality is suitably 
re-defined. The coset approach could have been used in Solution 19 and 
the congruence approach could have been used in the definition of quo- 
tient spaces—and, in what follows, the latter will in fact be used whenever 
it seems convenient to do so. 

There are three constructions that are universal in the sense that they 
occur in every kind of mathematical structure and are important when- 
ever they occur: they are usually referred to by expressions such as sub- 
structures, direct sum structures, and quotient structures. Such construc- 
tions appear, in particular, in group theory, and in topology, and, of course, 
in linear algebra. Itis a good idea to acquire some facility in handling them, 
and, in the particular case of linear algebra, the most obvious questions 
concern dimensions. 


Problem 51. (a) Zs there an example of a vector space V and a 
subspace M such that neither M nor V /M is finite-dimensional? 

(b) Is there an example of a vector space V and subspaces M 
and N such that V /M is finite-dimensional but V /N is not? 


Comment. The quotient language and the quotient notation (V/M) 
might strike some people as inappropriate—shouldn't the language and 
the notation indicate subtraction rather than division? Yes and no. In many 
parts of mathematics sets of ordered pairs (such as a direct sum U @ V), 
are called Cartesian products, and in such cases it is natural to look at the 
reverse as a kind of division. The trouble is that different parts of linear 
algebra come, historically, from different sources, and the terminological 
clash is unchangeable by now. It's not hard to learn to live with it. 
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52. Dimension of a quotient space 


Problem 52. If V is an n-dimensional vector space and M is an 
m-dimensional subspace, what is the dimension of V/M? 


53. Additivity of dimension 


If M and N are subsets of a set, there is a natural third set associated with 
them, namely their union, MUN. If M and N are finite, and if card (for car- 
dinal number) is used to denote “the number of elements in”, then some- 
times 


card(M U N) = card M + card N. 


More precisely, the equation is true when M and N are disjoint (M N N = 
@). If they are not disjoint, then the right side counts the elements common 
to M and N twice, and the equation is false. 

If M and N are subspaces of a vector space, there is a natural third 
subspace associated with them, namely their span, M + N. If M and N are 
finite-dimensional, then sometimes 


dim(M + N) = dimM + dimN. 


More precisely , the equation is true when M and N are disjoint (which 
means that M N N = 0); otherwise it’s false. What is always true? 


Problem 53. /f Mi and N are finite-dimensional subspaces of a vec- 
tor space, what relation, if any, is always true among the numbers 
dim M, dim N, dim(M + N), and dim(M N N)? 


Terminological caution. For subspaces “disjoint” means that their inter- 
section is O, not Ø. Since the zero vector belongs to every subspace, the 
latter is impossible. 


CHAPTER 4 


TRANSFORMATIONS 


54. Linear transformations 


Here is where the action starts. Till now the vectors in a vector space just 
sat there; the action begins when they move, when they change into other 
vectors. A typical example of a change can be seen in the vector space Ps 
(all polynomials of degree less than or equal to 5): replace each polynomial 
p by its derivative Dp. 
What is visible here? If pı (x) = 3x and po(z) = 52, then 
Dpi(r) 3 and Dp2(x) = 102; 


if moreover s is the sum p; + po, 


s(x) = 32 + 52°, 
then 


Ds(z) = 3 + 10z. 


This simple property of differentiation is from the present point of view its 
most important one: the derivative of a sum is the sum of the derivatives. 
An almost equally important property is illustrated by 


D(Tp2 (z)) = 702; 


the general assertion is that « 


D(ap(z)) = aDp(z) 


for any polynomial p and for any scalar a. In words: the derivative of a 
scalar multiple is the same scalar multiple of the derivative. 
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These two features of the change that differentiation effects can be 
described by just one statement: whether you form a linear combination 
first and then change, or change first and then form the same linear com- 
bination—the result will be the same. That’s what makes differentiation 
important in linear algebra; the property is described by saying that D is a 
linear transformation. 

Here is another example: consider the vector space IR?, and stretch 
each vector (x, y, z) by a factor of 7. Let S be the symbol for this stretch, 
so that S changes (1, 0, 2) (call it u) into (7,0, 14) (= Su), and S changes 
(3, —1, 5) (call it v) into (21, —7, 35) (= Sv), and, generally, 


S(z,y,z) = (T£, Ty, 72). 
Look at a linear combination such as 
3u — 2v, 
which is equal to 
(3,0,6) — (6, —2, 10) 
and therefore to 
(—3,2, —4), 
and then stretch to get 
(—21,14, —28). 


Other possibility: stretch u and v separately and then form the linear com- 
bination to get 


Su = (7,0,14), Sv = (21,—7,35) 


and 


3Su — 28v = (21,0,42) — (42, —14, 70) = (—21, 14, —28) 


—the same final answer. 

It works every time. Given two vectors (or, for that matter, any finite 
number), if you form a linear combination of them and then stretch, or if 
you stretch each vector first and then form the same linear combination, 
the results will always be the same, and, for that reason, the act of stretch- 
ing is called a linear transformation. (Symbols such as Sv and 5(1,0,2) 
are pronounced the way we are all taught when we learn the language of 
functions: they are “S of v" and “S of (1,0,2)".) 
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To understand a mathematical phenomenon it is essential to see sev- 
eral places where it occurs and several where it does not. In accordance 
with that principle, what follows now is a description of each of five trans- 
formations (“changes”) on a vector space; some of them are linear trans- 
formations and some of them are not. 


(1) The vector space is R?; the transformation T' changes each vector by 
interchanging its coordinates: 
T(z,y) = (y, 2). 
(2) The vector space is R?; the transformation T replaces each coordinate 
by its square: 
T(z.y) = (z^, y^). 
(3) The vector space is R?; the transformation T replaces each coordinate 
by its exponential: 
T(z,y) = (e^, e"). 


(4) The vector space is P; the transformation T integrates: 


To(2) = | pot 


(5) The vector space is R?; the transformation T replaces each coordinate 
by a certain specific linear combination of the two coordinates: 


T(x, y) = (2x + 3y, 7z — 5y). 


The result is that (1), (4), and (5) define linear transformations and 
(2) and (3) do not. The verification for (1) and (5) is easy. In each case, 
just replace (zx, y) by an arbitrary linear combination 


ay (21,41) + e2(22, Y2), 


apply T, and compare the result with the result of doing things in the other 
order. (Is “other order" clear? It means apply T to each of (21, y1) and 
(22, y2) and then form the linear combination.) In other words, the ver- 
ification consists of applying the very definition of linear transformation, 
and that yields what is wanted. The truth of (4) depends on known facts 
about integration: the integral of a sum is the sum of the integrals, and the 
integral of a scalar multiple is the scalar multiple of the integral. 
As for (2): everything goes wrong. If it were true that 


(s+)? =s? +t? 
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for all real numbers s and ¢ (an identity that a few beginning students of 
mathematics are in fact tempted to believe), then it would follow that the 
transformation T satisfies one of the necessary conditions for being linear. 
Namely, it would then be true that 

((z1 + 22), (1 + y2)") = (zi + 23, uf +42), 


and hence that 


T((21,41) + (2,92)) = T (21,41) + T(z2,92). 


But even if that were right, scalar multiples would still misbehave. For lin- 
earity it is necessary that T (oz, ay) should equal aT(z, y), but 


T(oz, oy) = (o?2?, o?y?) 
and 
aT(z, y) = o(z?, y?) = (az?, ay”), 


and except in the rare cases when a = o? the two right sides of these 
equations are not eager to be the same. 

Warning: this argument would be regarded with disapproval by many 
professional mathematicians. The trouble is that the argument does not 
prove that T' fails to be a linear transformation; it just points out that the 
natural way to try to prove that T' is a linear transformation doesn't suc- 
ceed. The only convincing way to prove that T does not satisfy the iden- 
tity that linearity requires is to exhibit explicitly with concrete scalars and 
vectors, a linear combination that T does not cooperate with. That's easy 
enough to do: if, for instance, (z, y) — (1, 1) and o — 2, then 


T(oz,ay) = (2,2?) = (4,4) 
and 
oT(z,y) = 2- (1,1) = (2,2). 


The negative assertion that the transformation T' described in (3) is 
not a linear transformation either is proved similarly. An explicit coun- 
terexample is given by (x, y) = (0,0) and a = 2; in that case 


T(oz,oy) = T(0,0) = (1,1) 
and 


oT (x,y) = 2- (1,1) = (2,2). 
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Problem 54. (a) Which of the following three definitions of trans- 
formations on R? give linear transformations? (The equations are 
intended to hold for arbitrary real scalars a, B, vy, 6.) 

(1) TE, n) = (o£ + Bn, YE + ôn). 

(2) T(E, n) = (a£? + Bn?, YE? + ôn?) 

(3) T(E, n) = (£ + B?n, YE + 6°n). 

(b) Which of the following three definitions of transformations 

on P give linear transformations? (The equations are intended to hold 
for arbitrary polynomials p.) 

(1) Tp(x) = p(z?). 

(2) Tp(2) = (p(z))”. 

(3) Tp(z) = z?p(z). 


55. Domain and range 


Integration on the vector space P is a perfectly good linear transformation 
(see Problem 54), but the same equation 


ry) = | v(t) (*) 


as the one that worked there does not work on the space P5; the trouble is 
that the degree of the polynomial that it gives may be too large. Right? If, 
for instance, p(x) = z? then 


* lel] a$—28 
= |-| = . 
[ o«- [pe] - 5 


Differentiation on the vector space P5 might seem to run into similar 
trouble—it lowers degrees instead of raising them—but, in fact, there is 
nothing wrong with it. Sure, it's true that D applied to a polynomial of 
degree less than or equal to 5 always yields a polynomial of degree less 
than or equal to 4, but 4 is less than 5, and vectors in Ps stay in Ps. 

These two examples prepare the ground for a small but useful gener- 
alization of the concept of linear transformation and for introducing two 
important constructs associated with each linear transformation. The gen- 
eralization is to a transformation (— change, function, mapping, map, op- 
erator, etc.) that changes each vector v in one vector space into a vector 
Tv in a possibly different vector space, and does it in such a way that it “co- 
operates” with linear combinations. The technical word is “commutes”; it 
means, of course, that the result of forming a linear combination and then 
transforming is always the same as the result of transforming first and then 
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forming the linear combination. Symbolically: 


T(au + Bv) = aTu + BTv. 


Example: if T is the integration defined by the equation (+) above, then 
T is a linear transformation from P; to Pg (where the meaning of Pg is 
surely guessable: it is the vector space of all real polynomials of degree 
not more than 6). The same equation can also be regarded as defining a 
linear transformation from P; to P, and many others in similar contexts. 
A linear transformation from a vector space V to itself is called a linear 
transformation on V; that's the kind that was introduced in Problem 54. 

The set of vectors where a linear transformation starts is called the 
domain of T, and the set of vectors that result from applymg T to them is 
called the range of T'; the abbreviations 


domT7T and ranT 


are quite commonly accepted. So, for example, if T' is differentiation on 
Ps, then 


domT =P; and ranT = P4; 


if T is differentiation from P; to P, then also dom T = Ps and ran T = P4. 
Some confusion is possible here and should be avoided. Integration can be 
considered to define a linear transformation from Ps to Pg or from P; to 
P, or from Ps to P200; in each of these cases the domain is P5 and the range 
is a part of Pg. (Which part? The question deserves a moment's thought.) 
The vector space that follows the specification “to” plays a much smaller 
role than the range, a subsidiary role, and it does not have a commonly 
accepted name; the word "codomain" is sometimes used for it. 

Important observation: the domain of a linear transformation is always 
a vector space, and so is the range. 

Integration is not the only useful linear transformation from one vec- 
tor space to a different one. The change of variables example in Problem 
54, 


Tp(z) = p(z^), 


can be regarded as a linear transformation on P, or, alternatively, the same 
equation can be used to define a linear transformation from Ps to Po. 
Right? If p(x) = zt, then Tp(z) = zê. Similarly the multiplication example 
in Problem 54, 


Tp(z) = xp(z), 
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can be regarded as a linear transformation on P, or, alternatively, the same 
equation can be used to define a linear transformation from P; to P7. (If 
p(x) = z^, then Tp(z) = x9.) 

The domains and ranges of the linear transformations given as exam- 
ples in Problem 54 are not difficult to find. Challenge: check that for the 
stretching example on R? both the domain and the range are equal to R?, 
and for the interchange example ((1) in the discussion of Problem 54), and 
for the linear combination example ((5) in the discussion of Problem 54), 
both the domain and the range are equal to R?. 

When trying to understand domains and ranges for linear transforma- 
tions between possibly different vector spaces, it is a good idea to study at 
least a few new examples. The problems that follow describe some. 


Problem 55. (1) The set R of all real numbers is a real vector space, 
which in that capacity is denoted by Rl. The sum of two real num- 
bers x and y, considered as vectors, is just the ordinary sum obtained 
by considering them as the real numbers they are and adding them; 
the multiple of a "vector" (real number) by a "scalar" (real number) 
is just the product obtained by forming the product of the two real 
numbers. The equation 


F(z,y) =x+2y 


defines a linear transformation from R? to R!. This example is a 
special case of an important class of linear transformations: a lin- 
ear transformation from any real vector space V to the special vector 
space R! is called a linear functional on V. (The use of “on” here is 
in slight collision with the use explained before, but that’s life—with 
a little care confusion can be avoided.) What are the domain and 
the range of the particular linear functional here defined? What, in 
general, can be said about the range of a linear functional? 


(2) Does the equation 
Tp(z) = p(x + 2) 


define a linear transformation from Ps to Pio? If so, what are its 
domain and its range? 


(3) Does the equation 
T(z, Y, z) = (0, 0) 


define a linear transformation from R? to R?? If so, what are its do- 
main and its range? 
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(4) Does the equation 


T(z, y,z) = (£ +2,y +2) 


define a linear transformation from R? to R?? If so, what are its do- 
main and its range? 


(5) Let R, be the set of all positive real numbers, and try to make 
it into a real vector space. To do that, it is necessary to define an “ad- 
dition” for any two positive real numbers and to define the “scalar 
multiple” of any positive real number by an arbitrary (not necessarily 
positive) real number. In trying to do that it would be dangerous to use 
the ordinary symbols for addition and multiplication—that way con- 
fusion lies. To avoid that confusion, the sum about to be defined will 
be denoted by and the product by [:], and the actual definitions 
are as follows: if s and t are positive real numbers, then 


s[*]t = st 


(that is, the new sum of s and t is the plain old product of s and t), 
and if s is a positive real number and z is an arbitrary real number, 
then 


s[]s = s7 


(that is, the new product of s by x is s to the power z in the usual 
sense). This is a weird procedure, but it works; it actually defines a 
real vector space. Does the equation 


T(s) = log s 


define a linear functional on that vector space? If so, what is its range? 
Caution: what does "log" mean—does it mean logio or log, or 
logs? 


56. Kernel 


There is a real number 0 and (in every vector space) there is a vector 0, and 
no confusion will ever arise between the number 0 and the vector 0; on the 
rare occasions when one threatens, a few cautionary words will dispel it. 

The symbol 0 has more than just two uses in mathematics, and even in 
linear algebra; here, for instance, is a third. Consider any two vector spaces 
V and W, and define a transformation T from V to W by writing 


Tv=0 
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for all v in V. Since the description of T warned that its range will be in W, 
it is clear that the symbol 0 on the right side of this equation must stand 
for the zero vector in W. Is T a linear transformation? Sure—obviously. 
Form all the linear combinations you like in V and then apply T to them. 
Since T sends everything to 0 in W, the result obtained by applying T first 
and forming linear combinations later will always be 0. That’s just fine: 


T(au + 8v) 


is indeed the same as 


aTu + BTv 


because T(au + Bv) is 0 and aTu + Tv is a - 0 + £ - 0, which is also 0. 
This very special linear transformation (that can be used between any two 
vector spaces) has a special name, namely 0. 

Linguistic interruption: if T is a linear transformation and v is a vector 
in the domain of T, the corresponding vector Tv in the range is often called 
the image of v, or the transform of v, under the action of T. (Caution: here 
“transform” is the right word, not “transformation”.) So: the zero linear 
transformation from V to W is the one that maps (= sends) every vector 
v to 0, or, in other words, it is the one for which every vector in V has the 
same image, namely 0. 

Zero plays an important role in linear algebra. So, for instance, every 
linear transformation sends 0 to 0. (A precise proof of that comment is 
easy, but it belongs to the hairsplitting axiomatics of the subject, which will 
be treated later.) There could perfectly well be many other vectors, differ- 
ent from 0, that a linear transformation T sends to 0 also. The collection 
(set) of all those vectors gives vital information about T; it is called the 
kernel of T', abbreviated 


ker T. 


For a first example, consider the zero transformation from, say, IR? to 
R?: what is its kernel? In other words: what is the set of all vectors in IR? 
that the transformation 0 sends to the vector 0 in R?? Answer: 0 sends every 
vector in R? to 0 in R?; the kernel of 0 is the whole space R?. 

Consider next the linear transformation T on P; defined by 


Tp=p 
for all p. Is it really a linear transformation? Sure—that’s very easy. What 
is its kernel? In other words, what is the set of all polynomials in P; whose 
image under this T is the zero polynomial? Answer: T can send no poly- 
nomial to 0, except only the polynomial 0; the kernel of this T is the set 
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consisting of the polynomial 0 only. The cautious notation for that set is 
{0}. The braces are needed: it is important to realize and to emphasize 
that the kernel of a linear transformation is always a set. The set might, to 
be sure, consist of just one object, but it is a set just the same. (Analogy: a 
hatbox with just one hat in it is not the same as a hat.) 

By the way: the equation Tp = p defines a linear transformation on 
every vector space, and that linear transformation is an important one with 
a special name. (Recall that “on” in this context describes a linear transfor- 
mation from the given vector space fo the same space.) It is always called 
the identity transformation, but it is not always denoted by the same sym- 
bol. Some people call it 


1 


(so that the number “one” and the identity transformation have the same 
symbol), others use the letter 7, and still others indicate the vector space 
under consideration by using the symbol Zy. In this book the first of these 
possibilities, the numerical symbol, is the one that will be used most of the 
time; *7" will be used when that practice threatens to lead to confusion. 


Problem 56. What are the kernels of the linear transformations 
named below? 


(1) The linear transformation T defined by integration, say, for 
instance, 


+9 
Tp(z) = i: _ Pit) 


from Ps to Ps. 
(2) The linear transformation D of differentiation on V. 
(3) The linear transformation T on R? defined by 


T(z, y) = (2x + 3y, 7x — 5y); 


see example (5) in Problem 48. 


(4) The linear transformation T from Ps to Pio defined by the 
change of variables 


Tp(z) = p(x”); 
see part (4) of Problem 48. 
(5) The linear transformation T on R? defined by 


T(z, y) = (z,0). 
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(6) The linear transformation F from R? to R! defined by 


F(z,y) = £ + 2y; 
see part (1) of Problem 55. 


57. Composition 


Differentiation (denoted by D) is a linear transformation on the vector 
space P of all polynomials, and so is the transformation M (multiplication 
by the variable) defined by 


Mp(x) = zp(z). 


What happens to a polynomial if both those transformations act on it, one 
after another? Suppose, to be specific, that D sends p to q, 


q= Dp, 
and then M sends q tor, 
Mq=r; 
what can be said about the passage from p to r? Write 
r=Tp, 
and, just to see what happens in a special example, let p(x) be 
2+ 3a + 42”. 
In that case 
q(x) = 3+ 82, 
and therefore 
r(x) = 3x + 8z?. 


Suppose now that the same thing is done not for one p but for two, 
and then a linear combination is formed, so that the result looks like 


a Tp; (x) + a2Tp2(z). 


What if the linear combination had been formed before the two-step trans- 
formation T? Would the result 


(T(aip: F a3p2)) (z) 
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be the same? In other words, is T a linear transformation? Isn’t it clear that 
the answer must be yes? To say that D is linear means that D distributes 
over vector addition and scalar multiplication—that is, that D converts a 
linear combination of vectors into the same linear combination of their 
D images. If M is allowed to act on the linear combination so obtained, 
then the linearity of M means that M distributes over it—that is, that M 
converts it into the same linear combination of the M images of the vectors 
that enter. These two sentences together say that T distributes over linear 
combinations, or, in official language, that T is linear. 

The reasoning just described is quite general: it proves that the 
composition of two linear transformations is a linear transformation. The 
concept of composition so introduced is just as often called product. The 
official definition is easy to state: if S and T are linear transformations on 
the same vector space, then the composition of S and T, denoted by 


ST, 


is the transformation that sends each vector v in the space to the vector 
obtained by applying T to v and then applying S to the result. 

Caution: the order of events is important. What would have happened 
to the polynomial 2 + 3x + 4z? if it had been multiplied by z first and then 
differentiated? Answer: the multiplication would have produced 


2x + 3x? + 423, 
and then the differentiation would have produced 
2+ 6z + 122”, 
which is not at all the same as the 
| 3z + 82? 
obtained before. In other words: 
MD#DM. 


For a different enlightening example, consider the multiplication 
transformation N defined on P by 


Np(z) = (1 — 3z?)p(z). 
In that case 
N(2 4- 3z + Az?) = (1 — 3x”) (2 + 3x + 42”) 
=2 + 3z — 2x? — 92% — 1224, 


TRANSFORMATIONS 63 


and the result of applying M to that is 
2x + 3x? — 22? — 9z* — 122°. 
On the other hand 


M(2 + 3z + 42?) = 2x + 32? + 42%, 
and the result of applying N to that is 


(1 — 3z?)(2z + 32? + Az?) = 2x + 32? — 2x? — 924 — 122° 


—the same thing. This could have been obvious without any calculation: 
the first result was obtained by two multiplications (1 — 3z? followed by 
x), as was the second (x followed by 1 — 3z?). Since the order in which 
multiplications are performed doesn't matter, it is no surprise that 


MN = NM. 


The result is described by saying that the linear transformations M 
and N commute (or they are commutative). The example of M and D 
shows that linear transformations may fail to commute—they may be 
non-commutative. 

Can transformation multiplication be defined for linear transfor- 
mations that go from one vector space to a different one? Yes— 
sometimes. If U and V are vector spaces, and if T is a linear transformation 
from U to V, then it makes sense to follow an action of T by another lin- 
ear transformation, say 5, but only if S starts where T left off, or, in more 
dignified language, if the domain of S includes the range of T.. If, in other 
words, for each vector u in U, the image Tu belongs to the domain of S, 
then ST'u makes sense, and if, to be specific, the range of S is included in 
a vector space W, then the product ST' is defined and is a linear transfor- 
mation from U to W. 

Here is an example. Suppose that U is R?, V is R?, and W is R’; let T 
be the linear transformation from R? to R? defined by 


T(z,y, z) = (2. y), 
and let S be the linear transformation from R? to R' defined by 
S(z,y) =z +y. 
In that case 
ST(z,y,z) = S(z,y) =z +y. 


Note: the product T'S cannot be defined. The only way a symbol such as 
T'S could be interpreted is as a transformation that starts where S starts, 
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that is, as a transformation with domain R?. But then the result of applying 
S yields a vector in R, and it does not make sense to apply T to it —T can 
only work on vectors in R?. If the product of two linear transformations S 
and T is defined, is it clear how to read a symbol such as ST? A possible 
source of confusion, which should be avoided, is that the symbol must be 
read “backward”, from right to left. To see how ST' acts on a vector, let T' 
act on it first and then, second, let S act on the result. The reason is that 
transformations are, after all, functions, and the usual functional notation (as 
in f(x)) puts the name of the function next to the name of the variable and 
to the left of it. Students of mathematics realize early that if 
f(r)-2? and g(x) =2+2, 
then 
f(g(z)) = (z +2}; 
the first function to act is the one next to z—the one on the right. (The 


other order would yield g( f (z)), which is x? + 2—not at all the same thing. 
Non-commutativity raises its head again.) 


Problem 57. (1) if S is the stretching transformation on R?, 


S(x,y) = (Tz, Ty), 
(see Problem 54) and T is the transformation on R? defined by 
T(z,y) = (2x + 3y, 7x — 5y), 


do S and T commute? 
(2) If S is the stretching transformation on R^, 


S(x, Y, z) = (7x, Ty, 7z), 
and T is the “projection” transformation from R? to R? defined by 


T(z,y,z) = (x,y), 


do S and T commute? 
(3) If S is the change of variables on P defined by 


Sp(x) = p(x’), 
and T is the multiplication transformation defined by 
Tp(z) = z^p(z), 


do S and T commute? 
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(4) If S is the transformation from R? to R! defined by 
S(x,y) = 2+ 2y, 
and T is the transformation from R! to R? defined by 
T(x) = (z, £), 


do S and T commute? 
(5) If S is the change of variables defined on P3 by 


Sp(z) = p(x + 2), 
and T is the transformation defined by 
T(a + Bat yx? + 62°) = a+ yz?, 


(for all o, B, , 6) do S and T commute? 
(6) In each of the preceding cases, what are the domains, ranges, 
and kernels of ST and T'S (when they make sense)? 


58. Range inclusion and factorization 
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When is one linear transformation divisible by another? In view of the dif- 
ference between right and left, the question doesn't quite make sense, but 


once divisibility is interpreted one way or the other then it does. 


Suppose, for instance, that A is called left divisible by B in case there 
exists a linear transformation T' such that A — BT. One relation between 


A and B is an immediate consequence, namely that 


ran Á C ran B. 
Is that necessary condition for left divisibility sufficient also? 
Problem 58. 7f two linear transformations A and B on a vector 
space are such that ran A C ran B, does it follow that A is left divis- 


ible by B? What is the analogous necessary condition for right divis- 
ibility? Is it sufficient? 


59, Transformations as vectors 


Two transformations A and B on a vector space can form a conspiracy: for 
each vector v the results of the actions of A and B on v can be added to 
yield a new vector. The result of this conspiracy assigns a vector, call it Sv, 
to each starting vector v—in other words, the passage S from v to Av + Bv 
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is a transformation defined on the underlying vector space V. It is natural 
to call the transformation S the sum of A and B. The commutativity and 
associativity of addition in V imply immediately that the addition of linear 
transformations is commutative and associative. 

Much more than that is true. The sum of any linear transformation A 
and the linear transformation 0 (see Problem 56) is A. If for each linear 
transformation A, the symbol —A is used to denote the transformation 
defined by 


(-A)v E —(Av), 
then 
A4 (—A) — 0, 


and the linear (!) transformation —A is uniquely characterized by that 
property. To sum up: the set of all linear transformations on V is an abelian 
group with respect to the operation of addition. 

If, moreover, for any scalar o and any linear transformation A a prod- 
uct œA is defined by 


(aA)v = a(Av), 


it follows that the set of all linear transformations on a vector space V is 
itself a vector space; a usable symbol for it might be L(V). 

The set L(V) has a structural property that not every vector space has, 
namely it has a natural multiplication defined on it, the composition of 
linear transformations. If A and B are linear transformations, then not 
only are aA and A + B linear transformations, but so also is AB; it is 
possible to form not only linear combinations of linear transformations, 
but also linear combinations of powers of a single linear transformation. 
“Linear combinations of powers" is a long way to say polynomials; if, that 
is, 


p(z) = ao + o4z +--+ + az" 
is a polynomial, and A is a linear transformation, then p( A) makes sense: 
p(A)- ao +a A +: c o4 AP. 


How are the linear and multiplicative properties of L(V) related to the 
vector space properties of V? 


Problem 59. (1) What can be said about the dimension of L(V) 
in terms of the dimension of V? 
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(2) If A is a linear transformation on a finite-dimensional vec- 
tor space, does there always exist a non-zero polynomial p such that 
p(A) = 0? 

(3) Jf xo is a vector in a vector space V, and yo is a linear func- 
tional on V, and if Az is defined for every x in V by 


Az = yo(z)zo, 


then A is a linear transformation on V; what is the smallest possible 
degree of a polynomial p such that p( A) — 0? 


60. Invertibility 


Addition can be undone—reversed—by subtraction; can the multiplica- 
tion of linear transformations also be undone somehow? Is there a pro- 
cess like division for linear transformations? Central special case: can the 
identity transformation 1 be “divided” by any other linear transformation 
T—-or, in other words, does a linear transformation always have a “recip- 
rocal”? 

Suppose that T is a transformation on a vector space V, so that T as- 
signs to each vector in V another vector (or possibly the same one) in V 
again. A candidate for a reciprocal should presumably be a transformation 
that goes backward, in the sense that it assigns to each vector in V the vec- 
tor that it comes from. That may sound plausible, but there is a catch in 
it—two catches in fact. To be able to go backward, each vector in V must 
be the image under T' of something (in other words, ran T' must be V), 
and it must make sense to speak of the vector that a vector comes from 
(in other words, T' must never send two different vectors onto the same 
vector). Equivalent language: the transformation T' must map V onto V 
(technical word: T is surjective), and T must do so in a one-to-one manner 
(technical word: T is injective). 

A typical violent counterexample (in case the vector space V is not the 
trivial space O) is the linear transformation 0: it fails to satisfy the condi- 
tions just described in the worst possible way. Not only is it false that T is 
surjective, but in fact ran 0 is as small as can be—it consists of one vector 
only—and not only is it false that 0 is injective, but in fact 0 is as many-to- 
one as can be—it sends every vector onto the same one. Here are a couple 
of other examples that are bad enough but not quite that bad: the linear 


transformations defined for every vector (£) in R? by 


aE 
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a) G) 
7] 7) 
- or horizontally, as (£,7)) ad lib. 


and 


(Vectors are written vertically, as 


The vertical symbol will seem more appropriate when matrices enter the 
picture, but the horizontal one is typographically more convenient.) The 
range of A consists only of vectors whose two coordinates are equal— 
and that's nowhere near all of R?. As for B, the vector (0,0) comes from 
infinitely many vectors—the transformation is nowhere near injective. 
(Question: do A and B fail for just one reason each, or for two? That is: 
granted that A is not surjective, is it at least injective? And: granted that B 
is not injective, is it at least surjective?) 

A linear transformation T on a vector space V is called invertible if it 
is both injective and surjective. If T' is invertible, then the transformation 
that assigns to each vector v in V the unique vector u that it comes from 
(that is, the unique vector u such that Tu = v) is called the inverse of T; it 
is denoted by T^! (pronounced “T inverse"). 

If an invertible T' acts on a vector v, what happens when the inverse 
T-! acts on the result? The answer is obvious: T^! sends Tv back to v, 
so that T^! (Tv) = v. Does it work the other way? In other words, what 
happens when T is made to go backward before it goes forward? That may 
be colorful language, but it's sloppy enough to be dangerous. What the 
question really asks is: what is T(T" ! v)? To find the answer, write u = 
T—1v, which says the same as Tu = v, and then unscramble the notation: 


T(T v) 2Tu- v. 


Another way of expressing the same question (and its answer) is to note 
that the inverse of a transformation, as here defined, is really a left inverse, 
and to ask whether it works from the right also. The question makes sense 
(since both T and T-! map V into V, the product can be formed in either 
order), and, as the discussion above shows, the answer is yes. 

The simplest example of an invertible transformation is 1, the identity; 
it is obviously both injective and surjective, and it is its own inverse. 


Is the linear transformation T' defined for all (£) in R? by 


Eea 
7 tn 
invertible? That's two questions: is T surjective?, and is T injective?— but, 
as it turns out an examination of surjectivity alone yields the full answer. 
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The surjective question for T reduces to the solvability of the equations 
2€é+7=a 
f+n=6 


for all a and B. That's a routine matter: € must be a — 8 and 7 must be 
—a + 2. That is: the candidate for the inverse of T is the linear transfor- 


mation S defined by 
cy beg 
86S) 


To check whether the candidate works, form the product ST. That is: find 
ST, at each (£), by forming 5 (7), where 


E) 
6] N€tn/J. 
The result of the substitution is 
sr(£) 2 ( (2€ +n) - (€ +n) ) _ S 
7 —(2€ + 9) + 2(€ +) nj’ 
and that does it—indeed, ST = 1. 


These examples suggest a question to which the answer must be known 
before the theory can proceed. 


Problem 60. Must the inverse of an invertible linear transformation 
be a linear transformation? 


61. Invertibility examples 


Problem 61. (1) Js the linear transformation defined by 


E\ (26m 
= (5) i (z + 7) 
invertible? 
(2) What about 


(9-0 


(3) Is the differentiation transformation D on the vector space 
P; invertible? 
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62. Determinants: 2 x 2 


If a and € are known numbers, can the equation 
ar =€ 


be solved for z? Maybe. If o is not 0, then all that’s needed is to divide by 
it. If a = 0, there is trouble: if £ Z 0, the equation has no solutions, and if 
€ = 0, it has too many—every x works. 

The question could have been asked this way: if a is a known scalar, 
and if a linear transformation T is defined on R! by 


Tz = oz, 


is T invertible? The answer is yes if and only if a Z 0. 

That answer is in fact the answer to a very special case of a very general 
question. The general question is this: how do the entries of an n x n matrix 
determine whether or not it is invertible? When n = 1, there is only one 
entry and the answer is that the matrix is invertible if and only if that entry 
is different from 0. 

What happens when n = 2? In other words, under what conditions on 
the numbers a, 3, y, 6 is the 2 x 2 matrix 


_(% 8 
Mr E s) 
invertible, or, equivalently, when can the equations 
az + By=€ 


yz + by =n 


be solved for x and y? 
To find z, eliminate y, which means multiply the first equation by 6, 
the second by 8, and subtract the second from the first to get 


(a6 — By)x = 6€ — Bn. 
To find y, multiply the first equation by y, the second by a, and subtract 
the first from the second to get 

(ad — By)y = an — BE. 


If a6 — By Z 0, then all that’s needed is to divide by it, but if aô — By = 0, 
there is trouble: the results obtained give no information about x and y. If 
either one of 6£ — Bn or an — f£ is not 0, the equations have no solutions, 
and if both are 0, they have too many—every pair (x, y) works. 
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* 


The expression aô — (+ is called the determinant of M, abbreviated 


det M, 


and the answer to the 2 x 2 question is that M is invertible if and only if 
det M # 0. The function det has some obvious properties and some non- 
obvious ones. An obvious property is that 

det 1 = 1, 


and, more generally, that if M is diagonal (that is, 8 = y = 0), then det M 
is the product of the diagonal entries. Another easy property is the behav- 
ior of det with respect to scalar multiplication: 

det(AM) = A? det M 


for every real scalar A. 
A non-obvious and possibly surprising property of det is that it is mul- 
tiplicative, which means that if M, and M» are two 2 x 2 matrices, then 
det(M; - M5) = (det Mı) - (det M2). 


If, in other words, 


So that 


002+ fı% Ooi085- 2] 
M, -M = ; 
ub as +6172 7182 + 6:62 


then it is to be proved that 


(aiaz + B172)(11B2 + 6162) — (0182 + B162)(a2 + 6172) 
= (a6; — Bim )(a262 — b22), 


and to prove that, there is no help for it—compute. 
Multiplicativity has a pleasant consequence about inverses. If M hap- 
pens to be invertible, then 


(det M) - (det M^!) = det(MM~"') = det 1 = 1. 
so that 
1 
det M` 


Can the results about the determinants of 2 x 2 matrices be generalized to 
bigger sizes? One tempting direction of generalization is this: replace the 


det M^! — 
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entries a, 3, y, 6 of a 2 x 2 matrix by 2 x 2 matrices (a), (8), (7), (6), thus 
getting a 4 x 4 matrix 
(9 e. 
(a) (6) 


The attempt to generalize the concept of determinant runs into a puzzle 
before it can get started: since the matrix entries may fail to commute, it is 
not clear whether the generalized determinant ought to be defined as 


(a)(6) — (BX), or (o)6)- (y)(8), or 
(6)(0) - (8)y), or = (6)(a) — Q8). 
These four “formal determinants" seem to play equal roles—no reason to 
prefer one to the others is apparent. The best thing to do therefore is to 


be modest: assume as much as possible and conclude as little as possible. 
Here is one possible question. 


Problem 62. Which (if either) of the following two assertions is 
true? i 


(1) Ifa 2 x 2 matrix M of 2 x 2 matrices is invertible, then at 
least one of its formal determinants is invertible. 

(2) If all the formal determinants of a 2 x 2 matrix M of 2 x 2 
matrices are invertible, then M is invertible. 


63. Determinants: n x n 


When linear algebra began it wasn’t called “linear algebra”—the central 
part of the subject was thought to be the concept of determinant. Nowadays 
determinant theory is considered a very small part of linear algebra. One 
reason is that this part of linear algebra is not really linear—the subject 
is an intricately combinatorial one that some people love for that reason, 
while others insist that the only elegant way to proceed is to avoid it when- 
ever possible. Every one admits, however, that a little of it must be known 
to every student of the subject, and here comes a little. 

It is not obvious how the definition of determinant for 2 x 2 matrices 
can be extended to n x n matrices. Even the basic definition with which 
most treatments begin is a little frightening. If M = (aij) is an nxn matrix, 
the cofactor of an entry o;;, call it A;;, is the (n — 1) x (n — 1) matrix 
obtained by removing from M the entire row and the entire column that 
contain a;;—that is, removing row i and column j. Thus, for instance, the 


B 


A Q 
cofactor of a in ( 
y 6 


) is the 1 x 1 matrix (6), and the cofactor of y is the 
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1 x 1 matrix (8). The standard definition of determinants uses cofactors 
to proceed inductively. The determinant of a 1 x 1 matrix (x) is defined to 
be the number z, and the determinant of an n x n matrix, in terms of its 
(n — 1) x (n — 1) submatrices (written here for n — 4), is given by 


det M = ay, - det Aj, — Qo; : det Ao; + 031 - det A3; — o4; - det A4. 


In words: multiply each element of the first column by the determinant of 
its cofactor, and then form the sum of the resulting products with alternat- 
ing signs. Special case: 


dét ir 2) = a - det(5) — 4; det(8), 


which agrees with the aô — (7 definition of Problem 62. 

Important comment: the definition does not have to be based on the 
first column—with a small modification any other column gives the same 
result. The modification has to do with signs. Think of the entries of a 
matrix as having signs painted on them: the sign painted on aj; is to be 
plus if i + j is even and minus if i + j is odd. (Equivalently: the sign of 
oj is the sign of the number (—1)**7.) Thus, o; and o3; are plus whereas 
05; and o4; are minus. The definition of determinant in terms of column 
j instead of column 1 (written here for n — 4) is either 


Qij’ det Aij — Q2j det Ao; tos: det Aaj —O4j- det Aaj 


or the negative of that depending on whether the sign of o; is plus or 
minus. Example: if 


O1; O12 O13 


M = | an 02 03 , 
O31 O32 33 
then 
det M= 011 (022033 E 0230132) 
R Q21 (0412033 = 13032) 
+ a31(012023 — 013022), 
and also 


det M = —012(021033 — 023031) 
+ @22(011033 — 013031) 


= 0132 (0111 23 = 0301 ). 


These formulas are called the expansion of det M in terms of the first and 
the second columns; there is also, of course, an expansion in terms of the 
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third column, and there are completely similar expansions in terms of the 
IOWS. 

The assertion that these various signed sums all give the same answer 
(in every case, not only for 3 x 3 matrices) needs proof, but the proof is 
not likely to elevate the reader's soul and is omitted here. Comment: if all 
parentheses are removed, so that the expansion of the determinant of a 
3 x 3 matrix is written out in complete detail, then each of the resulting six 
terms is a product (with its appropriate sign) of exactly one element from 
each row and each column. A similar statement is true about n x n matrices 
for every n: the value of the determinant is the sum, with appropriate signs, 
of the n! products that contain exactly one factor from each row and each 
column. A description such as that is sometimes used as an alternative 
definition of determinants. 

Another (definitely non-trivial) assertion is that (for matrices of all 
sizes) determinant is multiplicative: if M; and M» are two n x n matrices, 
then 


det(M, - M2) = (det M;) - (det M2). 


An easier property (with its easier proof also omitted) is that the de- 
terminant of an upper triangular matrix, such as for instance 


oo * 
ox * 
O x * * 
* * * X 


0 0 


is the product of the diagonal entries. (The picture is intended to convey 
that the entries below the main diagonal are 0 while the entries on and 
above are arbitrary. A similar picture and similar comment apply to lower 
triangular matrices also.) Special case: the determinant of a diagonal ma- 
trix is the product of the diagonal entries. Special case of that special case: 
the determinant of a scalar matrix A - 1 is A", and, in particular, the deter- 
minant of 1 is 1. For invertible matrices 
-1 

det M = det ar’ 

and, for any matrix, 


det(AM) = à” det M 


(where A is an arbitrary scalar). 

Far and away the most useful property of determinants is that 
det M # 0 if and only if M is invertible. (The proof is just as omitted as 
the other proofs so far in this section.) That property is probably the one 
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that is most frequently exploited, and it will indeed be exploited at some 
crucial spots in the sequel. 

The actual numerical values of determinants are rarely of any inter- 
est; what is important is the theoretical information that the existence of 
a function of matrices with the properties described above implies. The 
questions that follow, however, do have to do with the numerical evalua- 
tion of some determinants—they are intended to induce at least a small 
familiarity with (and perhaps a friendly feeling toward) such calculations. 


Problem 63. If Mı, M», and Ms are the matrices below, what are 
the values of det Mi, det M», and det M3? 
120 


210 
Mi-l1o 9 3 
00 4 


tO O O rm 
Otor.€o 
onNnNOoONO 
t0 Oo Oo rm 
coc oco o0 oc rc "—— ——BÁ— 


5 
ll 
0000608 ————. 


ONO OW o 
O2ON WOO 
oOoWwWwn co oO 
oOWO ON O&O 


64. Zero-one matrices 


Matrices with only a small number of different entries intrigue mathe- 
maticians—such matrices have some pleasant, and curious, and sometimes 
surprising properties. Suppose, for instance, that every entry of ann x n 
matrix is either 0 or 1—how likely is such a matrix to be invertible? It is 
clear that it doesn't have to be. The extreme example is the matrix in which 
every entry is equal to 0—the zero matrix. Equally clearly a matrix of 0's 
and 1’s (would *01-matrix" be a useful abbreviation?) can succeed in being 
invertible—the trivial example is the identity matrix. It seems reasonable 
that the more zeros a matrix has, the more likely it is that its determinant 
is 0, and hence the less likely the matrix is to be invertible. What in fact is 
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the largest number of 0’s that a 01-matrix of size n x n can have and still 
be invertible? Equivalently: what is the smallest number of 1’s that a 01- 
matrix of size n x n can have and still be invertible? That's easy: the answer 
is n. The number of 1’s in the identity matrix is n; if a matrix has fewer 1’s, 
then at least one of its rows must consist of 0’s only, which implies that its 
determinant is 0. (Why?) Consequence: the maximum number of 0’s that 
an invertible 01-matrix can have is n? — n. l 

What about 1’s? An n x n matrix has n? entries; if they are all equal to 
1, then, for sure, it is not invertible; in fact its determinant is 0. What is the 
largest number of 1’s that an n x n 01-matrix can have and be invertible? 
Triangular examples such as 


ooor 
oo fH a 
oO mnn 
mM Fn nn 


suggest the conjecturable answer 


n- (14-4 (n—1)) = TD. 


Is that right? 


Problem 64. What is the largest number of 1’s that an invertible 
01-matrix of size n x n can have? 


Comment. The displayed formula has a not uncommon problem at its 
lowest value: when n — 1, the left side has to be interpreted with some 
kindness. 


65. Invertible matrix bases 


The set L(V) of all linear transformations of an n-dimensional real vector 
space V to itself, or, equivalently, the set of all n x n real matrices, is a 
vector space that has a basis consisting of n? elements—see Problem 59. 
How good can that basis be? 


Problem 65. Does L(V) have a basis consisting of invertible linear 
transformations? 
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66. Finite-dimensional invertibility 


The two conditions needed for invertibility are easy to characterize indi- 
vidually: to say that a linear transformation T on a vector space V is sur- 
jective means just that ran T = V, and to say that it is injective means that 
ker T = {0}. The first statement is nothing but language: that's what the 
words mean. The second statement is really two statements: (1) if T is in- 
jective, then ker T = {0}, and (2) if ker T = {0}, then T is injective. Of 
these (1) is obvious: if T' is injective, then, in particular, T' can never send 
a vector other than 0 onto 0. Part (2) is almost equally obvious: if T' never 
sends a non-zero vector onto 0, then T cannot send two different vectors u, 
and us onto the same vector, because if it did, then (by linearity) T would 
send u, — uz onto 0. 

The interesting question along these lines is whether there is any rela- 
tion between the two conditions. Is it possible for T to be surjective but not 
injective? What about the other way around? These questions, and their 
answers, turn out to have a deep (and somewhat complicated) theory in 
the infinite parts of linear algebra. In the finite-dimensional case the facts 
are nearer to the surface. 


Problem 66. Can a linear transformation on a finite-dimensional 
vector space be injective but not surjective? How about the other way 


around? 
67. Matrices 
A matrix is a square array of scalars, such as 
-3 7 i m 
3 0 0 1 
V10 yr e? 19 
0 1 0 + 


This example is called a 4 x 4 matrix (pronounced "four by four"); all other 
sizes (2 x 2, 11 x 11) are equally legitimate. An extreme case is a 1 x 1 
matrix, which is hard to tell apart from a scalar. In some contexts (slightly 
more general than the ones here considered) even rectangular matrices 
are allowed (instead of only square ones). 

Matrices have already occurred in these problems. Solution 59 shows 
that a basis {e}, e2,..., €n } of a finite-dimensional vector space can be used 
to establish a one-to-one correspondence between all linear transforma- 
tions A on that space and all matrices {a;;}. The relation between a linear 
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transformation A and its matrix (o;;) is that 


n 
Ae; = J Oyj ei 
i=1 


for each j = 1,2,...,n. What is already known is that the linear proper- 
ties of linear transformations correspond just fine to the linear properties 
of matrices: if two transformations A and B have matrices (o;;) and {(;,;}, 
then a transformation such as £A + 7B has the matrix {€a;; + (i; }. What 
is not known is how matrices behave under multiplication: what, to be def- 
inite, is the matrix of AB? The only way to find the answer is to indulge in 
a bit of not especially fascinating computation, like this: 


(AB)e; = A(Bej) = A (x ae] = V Brj Ack 
k k 


-Zas 3 cuss) -y (x ous) " 


i 


Conclusion: the (i, j) entry of the matrix of AB is 
oak. 
k 


It may look complicated to someone who has never seen it before, but all 
it takes is a little getting used to—it is really quite easy to work with. 

Here is an easy example: the set C of complex numbers is a real (!) vec- 
tor space, the set (1,1) isa basis for that space, and the action C of complex 
conjugation is a linear transformation on that space—what is the corre- 
sponding matrix? That's two questions: what is the expansion, in terms of 
(1,1) of C1, and what is the expansion in terms of (1,2) of Ci? Since the 
answers are obvious: C1 = 1 and Ci = —i, the matrix is 


(0 4): 


That’s too easy. For a little more revealing example, consider the vec- 
tor space P, of all polynomials of degree 4 or less, with basis 


e = 1, €1 =t, e =t?, e3 edt €4 x 
and the linear transformation A on P4 defined by 


Az(t) = z(t + 1); 
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what is the matrix? Since 

Aeo = 1, 

Ae, —t4 1, 

Ae» = (t+ 1 = t? + 2t +1, etc, 
it follows that the matrix (look at its columns) is 
1 Te od 


oooocr 
oO Oo mn rtu 


3 
3 
1 
0 


Pe kD 


1 
0 
0 
0 


Problem 67. What happens to the matrix of a linear transformation 
on a finite-dimensional vector space when the elements of the basis 
with respect to which the matrix is computed are permuted among 
themselves? 


Comment. The considerations of invertibility introduced in Problem 60 
can be formulated more simply and more naturally in terms of matrices. 
Thus, for instance, the question about the transformation T of that prob- 
lem could have been asked this way: is the matrix 


2 1 
1 1 
invertible? 


68. Diagonal matrices 


Some matrices are easier to work with than others, and the easiest, usually, 
are the diagonal ones—they are the ones, such as 


5 0 0 0 
0 -e 0 0 
0 0 2 0 }’ 
0 0 0 100 


in which every entry not on the main diagonal is 0. (The main diagonal 
of a matrix {a;;} is the set of entries of the form a, that is the entries 
whose two subscripts are equal.) The sum of two diagonal matrices is a 
diagonal matrix, and, extra pleasant, even the multiplication formula for 
diagonal matrices is extra simple. The product of two diagonal matrices is 
the diagonal matrix obtained by multiplying each entry of one of them by 
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the corresponding entry in the other. It follows in particular that within the 
territory of diagonal matrices multiplication is commutative. The product 
of two matrices usually depends on the order in which they are multiplied, 
but if A and B are diagonal matrices, then always AB = BA. 

The simplest among the diagonal matrices are the scalar matrices, 
which means the matrices that are scalar multiples of the identity: matri- 
ces in which all the diagonal entries are equal. They are the simplest to 
look at, but it happens that sometimes the extreme opposite, the diagonal 
matrices in which all diagonal entries are different, are the easiest ones to 
work with. 


Problem 68. Jf A is a diagonal matrix whose diagonal entries are 
all different, and if B is a matrix that commutes with A, must B also 
be diagonal? 


69. Universal commutativity 


Which matrices commute with all matrices? To keep the question from be- 
ing nonsense, it must be assumed of course that a size is fixed once and for 
all and only square matrices of that size are admitted to the competition. 
One example is the identity matrix; another is the zero matrix. Which are 
the non-trivial examples? 


Problem 69. Which n x n matrices B have the property that AB = 
BA for all n x n matrices A? 


70. Invariance 


The set M of all vectors of the form 


(0, a, B, 8) 


in Rt (that is, the set of all vectors whose first coordinate is 0 and whose 
last two coordinates are equal) is a subspace of R^, and the matrix 


1 0 1 —1 
2 12 1 
A=]o16 d 
313 4 


defines a linear transformation on R*. If u is a vector in M, then the vector 
Au has its first coordinate equal to 0 and its last two coordinates equal 
to one another (true?)—in other words, it too is in M (along with u). This 
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phenomenon is described by saying that the subspace M is invariant under 
the linear transformation A. (The facts are often stated in this language: 
M is an invariant subspace of A.) 

If B is the scalar matrix 


oO or 
orto 
Noo 
ooo 


0002 


and if u is in M, then Bu is in M—the subspace M is invariant under B 
also. 


If C is the diagonal matrix 
1000 
0200 
001 07’ 
000 2 
and if u = (0,1, 2,2), then u is in M, but Cu is not; the subspace M is not 


invariant under C. 

The concept of invariance plays a central role in linear algebra (or, for 
that matter, in all mathematics). The most primitive questions about in- 
variance are those of counting: how many invariant subspaces are there? 
If, for instance, the underlying vector space V is the trivial one, that is the 
space O, then every linear transformation has exactly one invariant sub- 
space (namely O itself). If V is R! and if A is the identity transformation 
on R!, then there are just two invariant subspaces (namely O and R!.) 


Problem 70. Is there a vector space V and a linear transformation 
A on V, such that A has exactly three invariant subspaces? 


71. Invariant complements 


Invariant subspaces can be used to simplify the appearance of matrices. If 
A is a linear transformation on a vector space V of dimension n, say, and 
if M is a subspace of V invariant under A, of dimension m, say, then there 
exists a basis {e1, e2,...,€n} of V such that the vectors e; (j = 1,...,m) 
belong to M. With respect to such a basis, the matrix corresponding to A 


has the form 
P Q 
0 R?’ 


where P is an m x m matrix, R is an (n — m) x (n — m) matrix (square), 
and Q is an array (a rectangular matrix) with m rows and n — m columns. 
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The entry 0 in the lower left corner deserves mention too: it is an array (a 
rectangular matrix) with n — m rows and m columns. 

The more zeroes a matrix has, the easier it is to compute with. For 
each linear transformation it is natural to look for a basis so as to make 
the corresponding matrix have many zeroes. What the preceding comment 
about matrices shows is that if the basis begins with vectors in an invariant 
subspace, then the matrix has an array of zeroes in the lower left corner. 
Under what circumstances will the other corners also consist of zeroes? 

When, for instance, will it happen that P = 0? That’s easy: P = 0 
means that Ae; = Ofor j = 1,...,m, so that the span M of {e;, €2,...,€m} 
is in the kernel of A—that’s necessary and sufficient. 

When is Q = 0? Answer: if and only if the coefficients of the vectors 
€1,€2,.-+,€m in the expansion of Ae; (j = m + 1,...,n) are all 0. Better 
said: if and only if the image under A of each e; (j = m+1,...,n) belongs 
to the span of {€m41,...;€n}. Reformulation: the span of {em+1, - - - , €n} 
is invariant under A. In other words, a necessary and sufficient condition 
that the matrix of A have the form 


P 0 

0 R 
with respect to some basis (e;, €2,..., €n } is that for some m both the span 
of {e1, .. - , €m } and the span of (e4,4,,..., €n} be invariant under A. Best 


answer: the matrix of A has the form A M if and only if there exist 


two complementary subspaces each of which is invariant under A. 
How likely is that to happen? 


Problem71. Jf Ais a linear transformation on a finite-dimensional 
vector space, does every invariant subspace of A have an invariant 
complement? 


72. Projections 


If M and N are complementary subspaces of a finite-dimensional vector 
space V, then (Problem 28) corresponding to every vector z in V there are 
two vectors x and y, with z in M and y in N, such that z = z+ y, and, 
moreover, x and y are uniquely determined by z. The vector = is called the 
projection of z into (or just plain ^to") M along (or parallel to) N, and, 
similarly, y is the projection of z to N along M. 

A picture for the case in which M and N are distinct lines through the 
origin in the plane helps to see what's going on. If z is a typical point not 
on either M or N, then draw a line through z parallel to M; the point where 
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that parallel intersects N is the projection of z to N. (Similarly, of course, 
the intersection with M of the line through z parallel to N is the projection 
of z to M.) What happens if z does belong to M or N? 

Consider the correspondence (transformation) that assigns to each z 
the vector z—call it E—and, similarly, let F be the transformation that 
assigns to each z the vector y. The verification that E and F are linear 
transformations is dull routine (but necessary). Since x + y = z, it follows 
that E + F = 1. The word projection is used for the transformations E and 
F (as well as for the vectors x and y in their ranges): E is called the pro- 
jection on M (along N) and F the projection on N (along M). Warning: E 
emphasizes M, but its definition depends crucially on both M and N, and, 
similarly, F emphasizes N, but depends on both subspaces also. In other 
words if N, and N» are two different complements of M, then the projec- 
tions E, and E; on M along N; and Nz are different transformations. If 
M = V (in which case N is uniquely determined as O), then E = 1; if 
M = O, then E = 0. These are trivial examples of projections; what are 
some non-trivial ones? Could it be, for instance, that every linear transfor- 
mation is the projection to some subspace along one of its complements? 


Problem 72. (a) Which linear transformations are projections? 
(b) If E is the projection on M along N, what are ran E and 
ker E? 


Comment. Question (a) asks for a characterization of some algebraic 
kind (as opposed to the geometric definition) that puts all projections into 
one bag and all other linear transformations into an another. 


73. Sums of projections 


Is the sum of two projections always a projection? Of course not—the iden- 
tity transformation 1, for instance, is a projection, but the sum of the iden- 
tity and itself, the transformation 2, is certainly not a projection. (In fact 
the double of a projection different from 0 is never a projection. Proof?) 

On the other hand the sum of two projections can often be a projec- 
tion. A trivial example is given by any projection E and the projection 0. 
An only slightly less trivial example is 


(os) ™ ($1) 
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A numerically somewhat cumbersome example, in, however, the same 
spirit, is given by 


1 9 12 and 1 16 -—12 
25 \ 12 16 25-12 9 
What's going on? 


Problem 73. When is the sum of two projections a projection? 


74. Not quite idempotence 


A linear transformation E is a projection if and only if it is idempotent, and 
that condition is equivalent to the equation E(1— E) — 0. If that equation 
is satisfied then E = E? and 1 — E = (1 — Ey. It follows that if E is 
idempotent, then the slightly weaker equations 


E*(1—E)-20 and E(1—E)-0 


are satisfied also. Are these necessary conditions sufficient? Caution: does 
the question mean together or separately? 


Problem 74. Jf E is a linear transformation such that 
E?(1— E) — 0, 
does it follow that E is idempotent? Is that condition equivalent to 


E(-Ey-20? 


CHAPTER 5 


DUALITY 


75. Linear functionals 


The most useful functions on vector spaces are the linear functionals. Re- 
call their definition (Problem 55): a linear functional is a scalar-valued 
function £ such that 


E(x + y) = E(x) + &(v) 
and 
(ax) = o£(z) 


whenever z and y are vectors in V and o is a scalar. 
Example on R”: (2, %2,..-,2n) = 32}. 
Example on R?: £(z, y, z) = £ + 2y + 3z. 
Examples on the space P of polynomials: 


1 +1 
&p) = E std, or f. Pods 


P d 
2 D 
or f t?p(t?) dt, or di 
The most trivial linear functional is also the most important one, 
namely 0. That is: if £ is defined by 


&(z) =0 


for all z, then £ is a linear functional. Except for this uninteresting case, 
every linear functional takes on every scalar value—that's an easy exercise 
(see Problem 55(1)). So, for instance, if ĉo is a non-zero linear functional 


t=1 
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on a finite-dimensional vector space V, then there always exists a vector x 
in V such that £o(x) = 3. Does it work the other way? That is: if zo is a non- 
zero vector in a finite-dimensional vector space V, does there always exist 
a linear functional £ such that £(zo) = 3? That takes a little more thought, 
but the answer is still yes. To see that, let (z1,225,...,24,) be a basis for V 
that has the prescribed vector zo as one of its elements, say, zı = xo; then 
the first example above does the job. 

These statements are mainly about the ranges of linear functionals; 
can something intelligent be said about their kernels? If, for instance, £ 
and 7 are linear functionals (on the same vector space V) and o is a scalar 
such that n(x) = o£(xz) for every x in V, then, clearly, n(x) = 0 whenever 
E(x) = 0. Is there a chance that the converse statement is true? 


Problem 75. If & and 1 are linear functionals (on the same vector 
space) such that n(x) = 0 whenever (x) = 0, must there always 
exist a scalar a such that n(x) = o£(x) for all x? 


76. Dual spaces 


Old vector spaces can sometimes be used to make new ones. A typical ex- 
ample of such a happening is the formation of the subspaces of a vector 
space. Another example starts from two spaces, a vector space and a sub- 
space, and forms the quotient space. Still another example starts from two 
arbitrary vector spaces and forms their direct sum. 

One of the most important ways to get newvector spaces from old ones 
is duality: corresponding to every real vector space V there is a so-called 
dual space V'. The elements of V' are easy enough to describe: they are 
the linear functionals on V. Linear functionals can be added: if £ and 7 are 
linear functionals on V, then a linear functional o = € + 7 is defined for 
all z by 


c(z) = E(x) + n(x). 


Linear functionals can be multiplied by scalars: if € is a linear functional 
and a is a scalar, then a linear functional 7 = o£ is defined for all x by 


T(z) = a€(z). 
With these definitions of addition and scalar multiplication the set V’ of 


all linear functionals becomes a vector space, and that is the dual of V. 


Problem 76. Zs the dual of a finite-dimensional vector space finite- 
dimensional? 
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77. Solution of equations 


Every vector in a vector space (except the zero vector) is just like every 
other vector, and every linear functional (except the zero linear functional) 
is just like every other. If, however, F is a field, and, for some positive in- 
teger n, the particular vector space under consideration is F”, then the 
existence of built-in coordinates makes it possible to single out certain spe- 
cial vectors and special linear functionals. Thus, for instance, the vectors 
that have only one coordinate different from 0 and that one equal to 1 are 
often made to play an important role, and the same is true of the linear 
functionals p; defined by 


Dj(£y,..., 24) = Tj, j=l,...,n, 


called the coordinate functionals or coordinate projections. 

These vectors and functionals are most conspicuous in the procedure 
called “solving” systems of n linear equations in n unknowns. Here is what 
that seems to mean to most students: keep forming linear combinations of 
the given equations till they take the form 


T1 = Q1,.--,;In = On, 


and then feel justified in deciding that (a1, .. . , Œn) is the sought for solu- 
tion. 

The most fussily honest way of describing what the reasoning shows is 
to say that IF there is a solution, THEN the procedure leads to one, but an 
existence proof of some kind is probably called for, and so is a uniqueness 
proof. Some teachers worry about putting such an incomplete tool into the 
hands of their students, and feel called upon to justify themselves by an airy 
wave of the hand and the suggestion “just follow the steps backwards, and 
you'll be all right". Is that always true? 

A linear equation is presumably something of the form 


y(z) = a, 
where y is a linear functional (given), a is a scalar (known), and z is a vector 
(unknown). A system of n linear equations in n unknowns, then, involves 
n linear functionals y,,..., y, and n scalars o4,..., 04; what is wanted is 
a vector x = (£1, . . . , Zn ) such that 


yj (x) = o5, J= lan: 
Another way of saying all this is to define a linear transformation T from 
F” to F” by writing 


Tir) ST (gigs En) = (m (2)... 9 (2)) 
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and then hope that enough is known about T to guarantee that its range 
is all of F” (existence of solutions) and that its kernel is {0} (uniqueness). 
The hope, in other words, is that T' is invertible; if it is, then the desired 
solution (o5, ...,04) is given by T^! (a;,...,04). 

What hidden assumption is there in the usual solution procedure that 
might be enough to imply the invertibility of T? The not especially well 
hidden assumption is that it is possible to 

“keep forming linear combinations of the given equations till they 

take the form z; = Q),...,2n = Qn”. 


In other words: the span of the linear functionals y;, . .. , Yn (their span in 
the dual space of F”, of course) contains each of the coordinate projec- 
tions. 


Problem 77. If y;,..., ya are linear functionals on F” such that 
each of the coordinate projections belongs to their span, does it al- 
ways follow that the linear transformation T from F” to F” defined 
by 


T(z) = (ji (x), easy Yn(2)) 
is invertible? 


78. Reflexivity 


Sometimes two different vector spaces can resemble each other so much 
that it’s hard to tell them apart. At first blush the space P3 (polynomials of 
degree 3 or less) and the space Rt (sequences of length 4) may not look 
much alike, but on second thought maybe they do. What are vector prop- 
erties of polynomials such as 


ao + at + aat? + ost?, 


and how do they compare with the vector properties of sequences such as 


(&; Êz, £s, &4)? 


In this question the traditional use of the alphabet hurts a little rather than 
helps. Change the a’s to £'s and jack up their indices by one, or else change 
the £'s to os and lower their indices by one, and the differences tend to go 
away: the only difference between the vector 


& + €ot + bgt? + E40 
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in P and the vector 


(&1, 2, £s, &4) 


in R* is that one uses plus signs and ?'s and the other uses parentheses and 
commas—notation, not substance. More elegantly said: there is a natu- 
ral one-to-one correspondence between R* and P5, namely the correspon- 
dence T defined by 


T(&, £2, £3, £4) = & t [21 t &t? t &t?, 


and both that correspondence and its inverse are linear. Is the meaning of 
that sentence clear? It says that not only is T a one-to-one correspondence, 
but, moreover, if z = (£i, &, £3, £4) and y = (m5, n3, na) are in R* and 
a and are scalars, then 


T(a + Bn) = aT (€) + BT(n), 
and also that if p and q are in P3, then 


T~ (ap + Bq) = oT"! (p) + BT~*(q). 


(See Problem 60.) An informal but not misleading way of saying all this is 
to say that P4 and Rt differ in notation only, or even more informal and 
perhaps a tiny bit misleading, that P and Rt are essentially the same. 

The dignified word for “essentially the same" is isomorphic. An iso- 
morphism from a vector space V to a vector space W is a one-to-one linear 
correspondence T' from (all of) V to (all of) W; two spaces are called iso- 
morphic if there exists an isomorphism between them. Trivially V and V 
are always isomorphic, and if V and W are isomorphic, then so are W and 
V (the inverse of an isomorphism is an isomorphism), and, finally, if U, V, 
and W are vector spaces such that U is isomorphic to V, and V is isomorphic 
to W, then U is isomorphic to W (the composition of two isomorphisms is 
an isomorphism). What was just said is that isomorphism is an equivalence 
relation—and nothing that has been said on the subject so far is deep at 
all. 

Isomorphisms preserve all important properties of vector spaces. 
Example: isomorphic vector spaces have the same dimension. Proof: if 
{€1,.--,&,} isa basis for V and if T is an isomorphism from V to W, then 
{Té,,...,T&,} is a basis for W. Reason: if 


o1 TÉ, +---+anT& = 0, 
then 


Tim ki qeesssb G6.) = 0, 
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which implies that 
046; +++ + Onn = 0, 
and hence that 
a = =n =Q 


—the Tx;’s are linearly independent. If, moreover, y is an arbitrary vector 
in W, then T~1y (in V) is a linear combination of the z;'s—that is T~!y 
has the form a,£, + --- + o €, for suitable a’s. Apply T to conclude that 
y = o, Téi t: o Té,—the T£j's span W. 

The converse of the result just proved is also true: if two finite-dimen- 
sional vector spaces V and W have the same dimension, then they are iso- 
morphic. Proof: if (£,,...,£,) is a basis for V and {m,..., Tm} is a basis 
for W, then there is one and only one linear transformation T' from V to 
W such that T£; = nj (j = 1,...,n), and that transformation is an iso- 
morphism from V to W. 

The proof just sketched is correct, but in the view of some mathemati- 
cians it is ugly—it is unnatural. Reason: it depends on two arbitrary (un- 
natural) choices—a basis for V must be chosen and a basis for W must be 
chosen, and the argument depends on those two bases. If those bases are 
changed, the isomorphism T changes. Yes, sure, there exists an isomor- 
phism, and, in fact, there are many isomorphisms from V to W, but there 
is no reason to prefer any one of them to any other—there is no natural 
way to make the choice between them. 

There is one celebrated circumstance in which a natural isomorphism 
between two vector spaces does spring to the eye, and that is between a 
finite-dimensional vector space V and the dual space (V")' of its dual space 
V'. (Itis more convenient and customary to denote that second dual space 
by V".) The elements of V" are linear functionals of linear functionals. 
What is an example of such a thing? Here is one: fix a vector £o in V, and 
then, for each element 7 of V'— that is, for each linear functional 7 on 
V—write 


&y(n) = n(zo). 


Emphasis: ĉo is held fixed here (in V) and 7 is allowed to vary (in V’). 
Consequence: & is a function on V', a function of 7, and half a minute's 
staring at the way (£0) (which equals £5 (7))) depends on 7 should convince 
everyone that & is a linear functional on V’. 

Does every linear functional on V' arise in this way? 
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Problem 78. Jf V is a vector space, and if for each vector x in V an 
element z' (= Tx) of V" is defined by 


z'(n) = n(xz), 


then T is always a linear transformation from V to V" (verification?); 
is it always an isomorphism? 


Comment. The question belongs to one of the subtlest parts of linear al- 
gebra; to put it into the right context a couple of comments are advisable. 

(1) The mapping T' here defined is called the natural mapping from 
V to V". In case it happens that T maps V onto V" (so that it is an iso- 
morphism), the vector space V is called reflexive. (Warning: for infinite- 
dimensional, topological, vector spaces the same word is defined with an 
extra twist—the use of the present definition in those cases can lead to 
error and confusion.) 

(2) If a finite-dimensional vector space V is reflexive—if, that is, not 
only are V and V" isomorphic, but, in fact, the natural mapping is an 
isomorphism—then it is frequently convenient (though mildly sloppy) to 
identify the isomorphic spaces V and V". In more detail: if V is reflexive, 
then each vector x in V is regarded as the same as its image x’ (= T£) in 
V". As a special case, recall the construction (described in Solution 76) of 
a basis dual to a prescribed one. Start with a basis 


X = (zi,...,24) 
in V, let 
U- {t1,...,Un} 


be the dual basis in V’ (so that u;(z;) = 6;; for all i and 7), and do it again: 
let 


X' = (74,..., 25) 
be the basis in V" dual to U (so that 
a5 (ui) = 6: 
for all i and 7). Since 
x5 (ui) = ui(z) 
for all i and j, it follows that 


z(u) = u(z;) 
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for all u in V’ and for all j, so that 75 is exactly the image of x; under the 
natural mapping T from V to V". The proposed identification declares that 
for each j the vectors x; and x; are the same, and hence that the bases X’ 
and X are the same. 


79. Annihilators 


The kernel of a linear transformation (see Problem 56), and, in particular, 
the kernel of a linear functional is the set (subspace) of vectors at which it 
takes the value 0. A question can be asked in the other direction: if x is a 
vector in a vector space V, what is the set of all linear functionals u on V 
such that u(x) = 0? Frequently used reformulation: what is the annihilator 
of z? An obviously related more general question is this: if M is a subset 
(possibly but not necessarily a subspace) of a vector space V, what is the set 
of all linear functionals u on V such that u(x) = 0 for every x in M? The 
answer, whatever it is, is denoted by M? and is referred to as the annihilator 
of M. 

Trivial examples: 0° = V’ and V? = O (in V). If V is finite-dimen- 
sional and M contains a non-zero vector, then (see the discussion preced- 
ing Problem 63) M? # V’. 

The annihilator of a singleton (M = {z}) consists of all those linear 
functionals u for which z is in ker u. In the abbreviated language of double 
duality (see Problem 77) {x}? is the kernel (in V^) of the linear functional 
z in V" (originally denoted by z' in Problem 77). Consequence: the annihi- 
lator of {x} is always a subspace of V’. This consequence could have been 
derived perfectly easily without the double duality discussion, but that is 
where the result naturally belongs. If M is an arbitrary subset of V, then 
M? is the intersection of all the annihilators, such as {x}°, of the vectors 
z in M. Consequence: the annihilator of every set (in V) is a subspace (of 
y’). 

Since M? is a subspace, it makes sense to speak of dim M; can any- 
thing intelligent be said about it? 


Problem 79. /f M is an m-dimensional subspace of an n- 
dimensional vector space V, what is the dimension of MÌ in V'? 


80. Double annihilators 


Itis a good rule in most of mathematics that if you can do something once, 
you should do it again, and again and again and again: iterate whenever 
possible. Example: if it is a good thing to start with a subspace M of a 
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vector space V and form its annihilator M? in V', then it is a good thing 
to do it again. Doing it again means to start with M? in V' and form its 
annihilator (M°)? in V" (except that, just as for (V^)', it is saner to denote 
that double annihilator by M9?). 


Problem 80. ZfMis a subspace ofa finite-dimensional vector space 
V, what is the relation between M and M99? 


Comment. Strictly speaking M and MÓ? are incomparable: they are sub- 
spaces of the different vector spaces V and V". Since, however, every 
finite-dimensional vector space is reflexive, the identification convention of 
Problem 77 can and should be applied. According to that convention the 
space V" is the same as the space V, and both M and M? are subspaces 
of that space. 


81. Adjoints 


The concept of duality was defined in terms of very special linear transfor- 
mations, namely linear functionals; does it have anything to do with more 
general linear transformations? Yes, it does. 

Suppose that A is an arbitrary linear transformation on a vector space 
V and that u is an arbitrary element of the dual space V'. With those two 
tools at hand, there are two natural things to do to any particular vector 
x in V: form the vector Az and form the scalar u(x). And there is also a 
third thing: the two tools can form a conspiracy by being used one after the 
other (in the only order in which that is possible). It makes sense, that is, in 
addition to applying either A or u to z, to apply both by forming u( Az). If 
A and u are regarded as temporarily fixed, the expression u( Ax) depends 
on z alone—it is a function of z. Since both A and u depend linearly on z, 
so does their composition. If, in other words, a function v on V is defined 


by 
v(x) = u(Az), 


then v is a linear functional, an element of V'. 

A minor miracle just occurred: a linear transformation on V and a 
linear functional in V' collaborated to produce a new element of V'. That 
new element v can be viewed as the result of operating on the old element 
u by a transformation A’, so that 


v — A'u. 
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The transformation A’ is called the adjoint of A; it sends vectors in V' to 


vectors in V’. 
How does A’ operate on V’? Unsurprising answer: linearly. That is: if 


u = A1U1 + ous, 
then 
A'u = o4 Alu, + a2 Aus; 

the verification of that equation should by now be regarded as dull routine. 
That answers the question of how A’u depends on v. A more interesting 
and less commonplace question is this: how does A'u depend on A? An- 
swer: simply and beautifully, with only one small and harmless surprise. 

It is, for instance, child's play to verify that if A = 0 (on V), then A’ = 0 
(on V^) and, similarly, that if A = 1 (on V), then A’ = 1 (on V^). Simpler 
put: 0' = Qand 1’ = 1. The proof that if A and B are linear transformations 
on V, then 


(A+B) =A'+B’ 
is just as easy—the only difference is that a few more symbols are involved. 
Since, moreover, if A is a linear transformation on V and a is a scalar, then 
(aA) — aA', 
a part of the relation between linear transformations and their adjoints can 
be described by saying that A' depends linearly on A. 

What about products? If A and B are linear transformations (on V), 
and if C = AB, what can be said about C" (on V^) in terms of A’ and B^? 
Here comes the small and harmless surprise: 

(C'u)(z) = u(Cz) = u((AB)z)) = u(A(Bz)) 
= (A'u)(Bz) (by the definition of A’) 
= B'((A'u)z) (by the definition of B^) 
= ((B’A’)u) (x). 
Conclusion: (AB)! = B’ A'—the order of the product became reversed. 
The last question of this kind concerns inverses: if A is invertible, what 


can be said about (A~+)’? Answer: if A is invertible, (on V), then A’ is 
invertible (on V’) and 


(A7! = (4h 


The proof makes obvious use of the multiplicativity equation just proved 
and of the basic relation 1’ = 1. 
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At this point another opportunity presents itself to act on the principle 
of doing something doable again and again: if a linear transformation A on 
a vector space V yields a linear transformation A’ on V’, then A’ yields a 
linear transformation (A’)' on V’—what is the relation between A and 
its double adjoint? First comment: notational sanity suggests that ( A")' be 
denoted by A". Second comment: in order for A" and A to be comparable, 
it would be good to have them live on the same domain, and, to achieve 
that, it is a good idea to assume that V is finite-dimensional and, therefore, 
reflexive. Once that's done, the answer to the question becomes simple: 


A" = A. 
Proof: if z is in V and u is in V', then 
u(Ax) = ((A’u)z) (by the definition of A’) 
and 
(A'u)(z) 2 u(A"z) (by the definition of A"). 


So much for the general properties of adjoints; it is high time that a 
deeper understanding of them be acquired by studying their more special 
properties, which means their relations to other concepts. Example: linear 
transformations are intimately associated with certain subspaces, namely 
their kernels and ranges. What does the formation of adjoints do to kernels 
and ranges? 


Problem 81. Jf Aisa linear transformation on a finite-dimensional 
vector space V and if A’ is its adjoint on V', what is the relation 
between the ranges and the kernels of Aand A'? 


Comment. The question can be asked in any case, but the restriction to 
finite-dimensionality is a familiar and sane precaution. 


82. Adjoints of projections 


Can the adjoint of a concretely presented transformation be exhibited con- 
cretely? What, for instance, happens with projections? 


Problem 82. IfM and N are subspaces of a finite-dimensional vec- 
tor space V, what is the adjoint of the projection of V to M along N? 
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83. Matrices of adjoints 


Linear transformations have adjoints; what do matrices have? 


Problem 83. What is the relation between the matrix, with respect to 
some basis, of a linear transformation on a finite-dimensional vector 
space and the matrix, with respect to the dual basis, of its adjoint on 
the dual space? 


CHAPTER 6 


SIMILARITY 


84. Change of basis: vectors 


If V is an n-dimensional vector space, with a prescribed ordered basis 
(21,..., £n}, then each vector x determines an ordered n-tuple of scalars. 
This is an elementary fact by now: if the expansion of z in terms of the 
T's is 


T = Q11 +e + Antn, 


then the ordered n-tuple determined by z is just the n-tuple 


(01,...,04) 


of coefficients. The game can, of course, be played in the other direction: 
once a basis is fixed, each ordered n-tuple of scalars determines a vector, 
namely the vector whose sequence of coefficients it is. 

Now change the rules again, or, rather, play the already changed rules 
but change the emphasis. Given an n-dimensional vector space V and an 
ordered n-tuple (o, ... , an) of scalars, note that each basis {x1,...,2,} 
of V determines a vector in V, namely the vector described by the first 
equation above. That vector depends, obviously, on the basis. If (y1,.. . , yn} 


is also a basis of V, it would be a surprising coincidence if the determined 
vector 


Y = Yı t: Onyn 


turned out to be the same as z; as the basis changes, the vector changes. 
How? 
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Problem 84. [f {x1,...,2,} and {y1,..., yn} are bases ofa vector 
space, and if 
T = 2%, +++ ORA 


and 


Y =Y tct OnYn, 


what is the relation between the vectors x and y? 
Reformulation. What happens to vectors under a “change of basis”? 


Emphasis. The same coefficients (a1,...,@,,) appear in the two displayed 
equations. 


85. Change of basis: coordinates 


If (21,...,24) and {y1,..., Yn} are bases of a vector space, the problem 
of changing from one to the other can be thought of in two ways. 
(1) Given an ordered n-tuple (o1, ..., 04) of scalars, what is the rela- 
tion between the vectors 
T = QAT +: OT. 


and 


Y = O1Y1 tt oy? 


(2) Given a vector z, what is the relation between its coordinates with 
respect to the z’s and the y’s? A preceding problem (84) took the first point 
of view. How does the answer obtained compare with the one demanded 
by the second point of view? 


Problem 85. Jf(z;,...,z4] and {y1,..., yn} are bases ofa vector 
space, and if 
£i Toc EnEn = MY +++ RS: 


what is the relation between the coordinates and n? 


86. Similarity: transformations 


Vectors are in the easy part of linear algebra; the more challenging and 
more useful part deals with linear transformations. One and the same “co- 
ordinate vector” (a1,...,@n) can correspond to two different elements of 
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a vector space via two different bases—that’s what Problem 83 is about. 
A parallel statement on the higher level is that one and the same matrix 
can correspond to two different linear transformations on a vector space— 
that’s what the present discussion is about. 


Problem 86. If {x1,...,2n}and {y1,..., yn} are bases ofa vector 
space, and if 


Ba; = 5 QijTi 
i 
and 
Cy; = 5 QijYi, 
i 
what is the relation between the linear transformations B and C? 


Reformulation. What happens to linear transformations under a change 
of basis? 


Emphasis. The same matrix (aij) appears in the two displayed equations. 


Comment. This problem is only slightly harder than Problem 83, but 
much deeper. Two transformations related in the way here described are 
called similar, and similarity is the right, the geometric, way to classify lin- 
ear transformations. To say that two linear transformations are similar is 
to say, in effect, that they are “essentially the same”. Similarity is the single 
most important possible relation between linear transformations—it lies 
at the heart of linear algebra. 


87. Similarity: matrices 


A change of basis can be looked at in two ways: geometrically (what does 
it do to vectors?—Problem 83) and numerically (what does it do to 
coordinates?—Problem 84). The same two points of view are available in 
the study of the effect of a change of basis on the higher level: geometric 
(linear transformations) and numerical (matrices). The first of these was 
treated by Problem 85; here is the second. 


Problem 87. [fone basis ofa vector space is used to express a linear 
transformation as a matrix, and then another basis is used for the 
same purpose (for the same linear transformation), the result is two 
matrices—what is the relation between them? 
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88. Inherited similarity 


Similarity is sometimes passed on from one pair of transformations to an- 
other. Example: if B and C are similar, then so are B? and C°. Indeed: if 
TBT-! = C (see Solution 86), then 


TB?T- —TB(T T)BT = (TBT )(TBT )) = CC = C?. 


Similarly, of course, if B and C are similar, then so are B” and C" for all 
positive integers n, and, therefore, so are p(B) and p(C) for all polynomi- 
als p. (Minor comment: if B is similar to a scalar y, then T is equal to +. 
Reason: T4T^! = yTT-1.) 

The kind of reasoning used here can go a bit further. It proves, for 
instance, that if B and C are similar, then so are B’ and C". (Form the 
adjoints of both sides of the equation TBT-! = C, and use the results 
of Problem 80.) It proves also that if B and C are similar and if both are 
invertible, then B^! and C—? are similar. (Form the inverses of both sides 
of the similarity equation.) What is true about products? 


Problem 88. Jf B and C are linear transformations (on the same 
vector space), is it always true that BC and C B are similar? 


Question. Does it make any difference whether B and C are invertible? 


89. Similarity: real and complex 


Anyone who speaks of a vector space must have selected a coefficient field 
to begin with; anyone who speaks of a matrix has in mind its entries, which 
belong to some prescribed field. What happens to vectors, and linear trans- 
formations, and matrices when the field changes? If, in particular, two dif- 
ferent fields, E and F say, appear to be pertinent to some study, with E C F, 
then every vector space over F is automatically a vector space over E also 
(or, more precisely, naturally induces a vector space over E by just restrict- 
ing scalar multiplication to E); the general question is how much informa- 
tion an F fact gives about an E space. The following special question is a 
well known, important, and typical instance. 


Problem 89. Jf A and B are two real matrices that are similar over 
C, do they have to be similar over R? 
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90. Rank and nullity 
Which linear transformation (matrix) is “bigger”, 
lu. 0 0 1 
B={1 1 1 or C=|0 1 0]? 
L 22 1 0 0 


The question doesn’t make sense—except for the size of a matrix (in the 
sense in which these 3 x 3 examples are of size 3) no way of measuring 
linear transformations has been encountered yet. 

The transformation C is invertible, but the transformation B is not: the 
range of B is a 1-dimensional subspace. In other words, the transformation 
B collapses the entire space into a proper subspace—that might be one 
good reason for calling B smaller than C. Another reason might be that B 
“shrinks” some vectors (sends them to 0) and C does not. The dimension 
of the range of a linear transformation—called its rank—is a measure of 
size; in the present example 


rankB —1 and rankC —3. 


Roughly: transformations are large if their rank is large. The dimension of 
the kernel—called the nullity, and abbreviated as “null”—is another kind 
of measure of size; in the present example 


nullB —2 and nul cC =0. 


Roughly: transformations are large if their nullities are small. 
Is there a relation between these two measures of size? And what 
about the sizes of a transformation and its adjoint? 


Problem 90, Jf A isa linear transformation of rankr on a vector 
space of dimension n, what are the possible values of the rank of A'? 
What are the possible values of the nullity of A? 


91. Similarity and rank 


Problem 91. If two linear transformations on a finite-dimensional 
vector space are similar, must they have the same rank? 


92. Similarity of transposes 


If two linear transformations on a finite-dimensional vector space have the 
same rank, must they be similar? That question is the converse of the one 
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in Problem 91, and it would be silly to expect it to have an affirmative an- 
swer. (Are two invertible matrices always similar?) In some special cases, 
however, the affirmative answer might be true anyway. So, for instance, ev- 
ery transformation has the same rank as its adjoint—could that statement 
be strengthened sometimes to one about similarity? 


Problem 92. Is every 2 x 2 matrix G " similar to its transpose 


Dor) 


93. Ranks of sums 


Problem 93. Jf A and B are linear transformations on a finite- 
dimensional vector space, what is the relation of rank(A + B) to 
the separate ranks, rank A and rank B? 


94. Ranks of products 


Problem 94. Jf A and B are linear transformations on a finite- 
dimensional vector space, what is the relation of rank AB to the sep- 
arate ranks, rank A and rank B? 


95. Nullities of sums and products 


Since rank and nullity always add up to the dimension of the space (Prob- 
lem 90), every relation between ranks is also a relation between nullities. 
That is true, in particular, about the sum and product formulas (Problems 
93 and 94), but the nullity relations obtained that way are far from thrilling. 
There is, however, a nullity relation that comes nearer to a thrill, and that 
is not an immediate consequence of the rank relations already available. 


Problem 95. If A and B are linear transformations on a finite- 
dimensional vector space, is there a simple relation involving 
null ( AB), null A, and null B? 
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96. Some similarities 


The best way to get a feeling for what similarity means is to look at special 
cases—sometimes the answer is not what you would expect. 


Problem 96. 


l1 TA 
(a) IsB-[0 2 1 | similartoC = 
00 3 


0 1 1 
(b IsB=|0 0 1 | similartoC = 
0 0 0 
2-1-1 
(c) IsB=|0 3 1 | similartoC= 
00 3 
020 
similarto C =| 0 0 2]? 
000 


111 1 0 0 
(e) IsSB-|0 1 | similar to C = | 1 0}? 
0 0 1 1 1. 1d 


Comment. Similar questions make sense, and should be asked, for ma- 
trices of size larger than 3 x 3. 
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97. Equivalence 


The construction of a matrix associated with a linear transformation de- 
pends on two bases, not one. Indeed, if 


X-(zn..,z,) and X-(2,...,8,) 


are bases of V, and if A is a linear transformation on V, then the matrix of 
A with respect to X and X, denote it temporarily by 


A(X, €), 
should be defined by 


Ax; = ) Gi; Xi. 
a 


104 LINEAR ALGEBRA PROBLEM BOOK 


The definition originally given (see Problems 66 and 68) corresponds to 
the special case in which X = X. That special case leads to the concept of 
similarity—B and C are similar if there exist bases X and Y such that the 
matrix of B with respect to X is equal to the matrix of C with respect to Y, 
or, in the notation introduced above, if 

B(X, X) = C(Y, Y). 


The analogous relation suggested by the general case is called equiv- 
alence: B and C are called equivalent if there exist basis pairs (X, X) and 
(Y, Y) such that 

B(X, €) = C(Y, Ŷ). 


The principal question about equivalence, written out in complete detail, 
is in spirit the same as the original question (Problem 72) about similarity. 


Problem 97. If (z1,..., 24) (21,..., 84], [yi +++ Yn}, and 
(ji, -- , Jn}, are bases of a vector space, and if 


Bau; = » Oi; Xj 
i 
and 
Cy; = Y ashi, 
i 
what is the relation between the linear transformations B and C? 


Reformulation. What happens to linear transformations under two si- 
multaneous changes of bases? 


Emphasis. The same matrix (o;;) appears in the two displayed equations. 


Comment. The question is somewhat vague, just as it was in Problem 85. 
The relation is that there exist bases with the stated property—and why 
isn't that an answer? The unformulated reason, at the time Problem 85 
was stated, was the hope that the “geometric” definition could be replaced 
by an "algebraic" necessary and sufficient condition. That hope persists 
here too. 


98. Rank and equivalence 


If E is a projection with range M and kernel N, then there exists a basis 
[z1,..., Zr, 2541, -. , 24) of the space such that {zx1, . . . , £+} is a basis for 
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Mand {z,41,...,¢n} is a basis for N. The matrix of E with respect to that 


basis is of the form ] s , Where the top left “1” represents an identity 


matrix of size r x r and the bottom right *0" represents a zero matrix of 
size (n — r) x (n — r). Consequence: not only do similar projections have 
the same rank (Problem 91), but the converse is true: projections of the 
same rank are similar. Since similarity is a much stronger condition than 
equivalence, it follows in particular that projections of the same rank are 
equivalent. Is that statement generalizable? 


Problem 98.  ]f two linear transformations on a finite-dimensional 
vector space have the same rank, must they be equivalent? 


CHAPTER 7 


CANONICAL FORMS 


99. Eigenvalues 


A large vector space (one of large dimension, that is) is a complicated ob- 
ject, and linear transformations on it are even more complicated. In the 
study of a linear transformation on a large space it often helps to con- 
centrate attention on the way the transformation acts on small subspaces. 
The phrase “a linear transformation acting on a subspace” is usually inter- 
preted to mean that the subspace is invariant under the transformation (in 
the language of Problem 70), and a “small” subspace is one of dimension 
1 (surely the smallest that a non-trivial subspace can be). In view of these 
comments, a promising approach to the study of linear transformations 
would seem to be to search for invariant subspaces of dimension 1. 

If A is a linear transformation and if z is a vector in ker A then, of 
course, Ax = 0, and it follows that A(Ax) = 0 for every scalar A. Conse- 
quence: the 1-dimensional subspace consisting of all scalar multiples of x 
is invariant under A. This is an example—an extreme sort of example—of 
the possibility described above. What A does to this particular x is simply 
to multiply it by 0: 


Az = Ox. 


A less extreme example might be a linear transformation A and a vector z 
such that, say, 


Az = 72. 


Can that happen? Sure—it happens, for instance, when A = 7- I (the 
product of the scalar 7 and the identity transformation 7). It happens also 
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in less spectacular (but more typical and more useful) cases such as 


7 0 
A= (6 ah zı = (1,0). 


A small modification yields the less special looking example 


Ape (ij s) bici. 


A different looking and perhaps surprising example is 


As = Mam p z3 = (1, — 1). 
(Verifications?) 

All right, so A3z3 = 723; is that an accident or is it a bad habit that 
As has? What other vectors x have the property that A3z = 7x? Un- 
satisfactory answer: all scalar multiples of z3 have that property. (Right? 
Az(4z3) = 4(Aaza) = 4- 7x3 = 7 - (43).) Are there any others? The 
question amounts to asking for solutions (o, o2) of the equations 


10a, + 3a2 = 7a, 
—5a, + 2a2 = Ta». 


That’s a routine question and the easily calculated answer is that all so- 
lutions are of the form (7, —7), and those are exactly the “unsatisfactory” 
ones already dismissed. 

Is there something special about 7 and z3? Are there other scalars A 
and other vectors x such that 


Az = Az? 
This time the question is about the solutions of 
10o, + 3o» = Aoi 
—5a, + 2a2 = Aa», 


and that requires a little more thought. 

There is one dull solution, namely a; = a2 = 0—that works for every 
A and yields no information. If that’s dismissed, if, in other words, only 
non-zero vectors are to be accepted as solutions, then the question be- 
comes this: for which scalar values of A does the matrix 


10—A 3 
-5 2-2 
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have a non-trivial kernel? Equivalently: for which scalar values of A is it 


true that 
10-A 3 
mE ? 
det ( _5 5) 07 


The determinant is easy enough to calculate, and when that's done, the 
question becomes this: what are the roots of the equation 


A? — 1214+ 35 = 0? 


That can be answered by anyone who knows how to solve quadratic equa- 
tions. The answer is that there are only two values of A, namely 7 and 5. 

Curiouser and curiouser. The value 7 is an old friend, with the corre- 
sponding vector x3 = (1, —1). What vectors work for 5? That is: what are 
the (non-zero) solutions of the equations 


10a; + 3a2 = 5a, 
—5a, + 2a2 = 5a2? 


Easily calculated answer: all vectors of the form (37, —5r). 

The matrix here studied, and its relation to certain special scalars and 
vectors, exemplifies quite well the theory on which it all rests. General def- 
inition: an eigenvalue of a linear transformation A is a scalar A such that 


Ax = Ax, 


for some non-zero vector x. (With x = 0 the equation is totally useless; 
it is satisfied no matter what A and A are.) Every non-zero vector x that 
can be used here is called an eigenvector of A corresponding to the eigen- 
value A. 

A scalar A is an eigenvalue of a linear transformation A on a finite- 
dimensional vector space if and only if A — A has a non-trivial kernel, and 
that happens if and only if A satisfies the characteristic equation 


det(A — AI) = 0. 


The expression det(A — AT) is a polynomial of degree n (the dimension 
of the space) in A, called the characteristic polynomial of A. What's im- 
portant about the characteristic equation is what its roots are. In much of 
linear algebra and its applications the main problem is to find characteris- 
tic equations and their roots, that is eigenvalues. Here is a small sample. 


Problem 99. What can the characteristic equation of a projection 
be? 
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100. Sums and products of eigenvalues 


The eigenvalues of “good” matrices can be quite bad. Thus, for instance, 
the eigenvalues of a matrix of integers are not necessarily integers—they 


are not even necessarily rational (Example: ( )) Just how bad can 


1 0 
the eigenvalues of a good matrix be? 


Problem 100. Can both the sum and the product of the eigenvalues 
of a matrix of rational numbers be irrational? 


Comment. The question is about “can”, not about “must”. For the exam- 


ple ( "j the sum and the product are 0 and 2. 


101. Eigenvalues of products 


If A and B are linear transformations on the same finite-dimensional vec- 
tor space, and if AB is invertible, then each of A and B is invertible (det A- 
det B # 0), and therefore B Ais invertible. Contrapositively (with the roles 
of A and B interchanged): if AB is not invertible, then BA is not invert- 
ible. Another way of stating the result is that if 0 is an eigenvalue of AB, 
then 0 is an eigenvalue of BA also. For other eigenvalues the situation is 
not so clear: when A Z 0, there doesn't seem to be any way to pass from 
information about det(AB — AI) to information about det(BA — AI). 


Problem 101. Jf A and B are linear transformations on the same 
finite-dimensional vector space, and if is a non-zero eigenvalue of 
AB, must A be an eigenvalue of BA also? 


102. Polynomials in eigenvalues 


Problem 102. Jf Aisa linear transformation on a finite-dimensional 
vector space and if p is a polynomial, what information do the eigen- 
values of A give about p(A)? 


103. Diagonalizing permutations 


For matrices that are simple enough, the theory of eigenvalues works like 
a charm. For the trivial 1 x 1 matrix (2), there is of course nothing to do. 
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The 2 x 2 diagonal matrix 


(o 5) 


has two eigenvalues, and its study reduces to that of two trivial matrices of 
size 1 x 1. The 3 x 3 matrix 


2 0 0 
0 3 0 
00 4 


is larger, but it is still a beautiful one. It has three distinct eigenvalues and 
corresponding to them three disjoint eigenspaces—the notion of applying 
eigenvalues to reduce a large study to small pieces (Problem 96) still works 
just fine. 

The matrix 


wo 


3 0 
0 0 
00 4 


is from the present point of view perhaps a shade less beautiful—it has 
only two eigenvalues but its eigenspaces still have a total dimension 3, and 
its study presents no difficulties. Matrices such as 


3 1 0 
030], 
004 


(è s) 


misbehave a little more—the total dimensions of their eigenspaces are not 
as large as one could wish. It begins to look as if eigenvalue theory might 
not stretch to give complete information about matrices. 

Things get really tough with a matrix such as 


A-(5 "e 


Its characteristic equation is 


or, for that matter, 


341-20, 


and since there is no real number A that satisfies that equation, it looks as 
if eigenvalue theory might give no information about A. 
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The disease just noticed is not fatal; a hint to its cure is contained in 
the diagnosis. Sure, there is no real number that satisfies the characteris- 
tic equation, but that phenomenon is exactly what complex numbers are 
designed to deal with—and, indeed, they solve the problem. 

The properties of vector spaces and of linear transformations on them 
are strongly influenced by the underlying coefficient field. That fact was 
hardly noticeable till now—much of the theory works equally well for ev- 
ery field. The exposition till now has either tacitly or explicitly assumed 
that the coefficient field was the field R of real numbers—the field that is 
probably the most familiar to most students. In the applications however 
(of linear algebra, and of mathematics in general) the field C of complex 
numbers is often more useful. For that reason, from here on, it will be as- 
sumed that the vector spaces to be considered are complex ones, and that, 
correspondingly, the vectors and linear transformations to be studied ad- 
mit complex linear combinations. The typical coordinatized example will 
therefore be C” (not R”). 

The problem that follows is a small step toward getting used to the 
appearance of complex numbers. 


Problem 103. What are the eigenvalues and eigenvectors of the lin- 
ear transformation A defined on C? by 


A(z1, 22,23) = (22, 23,21)? 


104. Polynomials in eigenvalues, converse 


Does the converse of Solution 102 have a chance of being true? According 
to Solution 102, if A is an eigenvalue of A and p is a polynomial, then p(A) 
is an eigenvalue of p( A). The converse might be something like this: if p(A) 
is an eigenvalue of p( A), must it be true that A is an eigenvalue of A? No, 
that's absurd. Counterexample: if A = I (the identity transformation), 
à = 1 (the number), and p(\) = X2, then p(—1) (= 1) is an eigenvalue of 
A, but —1 is not an eigenvalue of p( A) (— 1). 

The negative solution of one possible version of the converse problem 
doesn't settle the issue—slightly weaker problems can be posed and can 
have hopes of affirmative solutions. Here is one: is every eigenvalue of 
p(A) of the form p(A) for some eigenvalue A of A? The question is one 
in which the coefficient field matters: it is conceivable that the answers for 
the real field and the complex field are different. 
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The answer for the real field is no. If 


0 1 
fot iso) 


then —1 is eigenvalue of A? (check?), but —1 is not of the form A? for some 
eigenvalue A of A, simply because A has no eigenvalues—no real ones, 
that is. If the real field is replaced by the complex field in this example, the 
answer changes from no to yes. Is that a lucky property of this example, or 
is it always true? 


Problem 104. Jf Aisa linear transformation on a finite-dimensional 
(complex) vector space and if p is a polynomial, is every eigenvalue 
of p( A) of the form p(^) for some eigenvalue A of A? 


105. Multiplicities 
If 


is 3 an eigenvalue of A? Sure, that's obvious: if zı = (1,0,0), then 


Ax, = 321. 


It is also true that every non-zero multiple of x; is an eigenvector of A 

“non-zero” because that’s how eigenvectors are defined), but that true 
statement is universally true and gives no new information. New informa- 
tion is, however, available: it is also true that if z2 = (0, 1,0), then 


Azo = 322. 


Both of the radically different vectors zı and zz are eigenvectors of A. 
What goes on is that the set of all those vectors x that satisfy the equation 


Az = 32 


(including the vector 0) is a subspace of dimension 2; it is sometimes called 
the eigenspace of A corresponding to the eigenvalue 3. 

(The vector 0 is never regarded as an eigenvector, but the vector 0 is 
always regarded as belonging to the eigenspace corresponding to an eigen- 
value A. This apparently contradictory use of language might take a few 
seconds to get used to, but it causes no trouble, and, in fact, it is more con- 
venient than being forced to deal with the awkward “punctured” subspace 
obtained by considering only the non-zero solutions of Az = Az.) 
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In some plausible sense the number 3 occurs twice as an eigenvalue 
of the matrix A above. If the multiplicity—in more detail the geometric 
multiplicity—of an eigenvalue A of a transformation A is defined as the 
dimension of the set of solutions of Ax = Az, then in the example under 
consideration the number 3 is an eigenvalue of geometric multiplicity 2. 

If 


how do the facts for B compare with the facts for A? The number 3 is an 
eigenvalue of B, 


Bzi = 321, 


just as it was for A. The vector x2, however, is not an eigenvalue of B. 
What is the geometric multiplicity of the eigenvalue 3 for B? The ques- 
tion is one about solutions (u, v, w) of the equations 


3u+v =3u 
3v. —3v 
4w = 3v. 


If (u, v, w) is a solution, then the last equation implies that w = 0 and 
the first equation implies that v — 0. Consequence: the eigenspace cor- 
responding to the eigenvalue 3 for B is the set of all vectors of the form 
(u, 0, 0)—Aa space of dimension 1. 

Isn't that just a little puzzling? The matrices A and B don't look very 
different: both are upper triangular, they have the same diagonal, and, con- 
sequently, they have the same characteristic polynomial, namely 


(A — 3?(A — 4) (= A? — 10? + 33A — 36). 


The geometric multiplicity of 3 as an eigenvalue of A seems to be caused 
by the exponent 2 on (A — 3)— but that exponent is there for B also. 

Well, that's life: the concept of multiplicity has in fact two distinct 
meanings. In the already defined geometric meaning the number 3 has the 
multiplicity 2 as an eigenvalue of A and the multiplicity 1 as an eigenvalue 
of B, but it has the same multiplicities for A and B in the other sense. The 
algebraic multiplicity of a number Ao as an eigenvalue of a linear trans- 
formation is the number of times Ao occurs as a root of the characteristic 
equation—or, better said, it is the exponent of (A— Ao) in the characteristic 
polynomial. 

It might be helpful to look at a natural concrete example. 
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Problem 105. What are the geometric and algebraic multiplicities 
of the eigenvalues of the differentiation transformation D on the space 
P3 of polynomials of degree less than or equal to 3? 


Reminder. The differentiation transformation was first mentioned as an 
example in Problem 54. 


106. Distinct eigenvalues 


What made the diagonalization possible in Problem 103? The easiest trans- 
formations to diagonalize are the scalars—the matrix of a scalar transfor- 
mation is diagonal with respect to every basis. If multiplicities are counted, 
as they always should be, a scalar transformation on a vector space of di- 
mension n has n eigenvalues—that is, one eigenvalue with multiplicity n. 
The opposite extreme to scalars are the transformations with n distinct 
eigenvalues—how difficult are they to diagonalize? 


Problem 106. Jf a linear transformation on a vector space of di- 
mension n has n distinct eigenvalues, must it be diagonalizable? 


107. Comparison of multiplicities 


For the examples 


3 0 0 3 1 0 
A—-|03 0 and B=1{0 3 0 
0 0 4 00 4 


of Problem 103, it turned out that 3 was an eigenvalue of algebraic multi- 
plicity 2 for both A and B, and it had geometric multiplicity 2 for A and 1 
for B. How difficult is it to find an example where the algebraic multiplicity 
is the smaller one? 


Problem 107. Does there exist a linear transformation on a finite- 
dimensional vector space with an eigenvalue A whose algebraic mul- 
tiplicity is less than its geometric multiplicity? 


108. Triangularization 


Can every matrix be diagonalized? (For a discussion of diagonalization see 
Problems 103 and 106.) The answer is no, and the almost universal coun- 


0 1 
terexample ( 0 2 proves it. Indeed the eigenspaces of a 2 x 2 diagonal 
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matrix have total dimension 2, but the “universal counterexample” has only 
one eigenvalue (namely 0) and only one eigenvector (namely (0, 1) and its 
scalar multiples). The example can be generalized: the matrix 


0100 
0010 
0001 
0000 
has only one eigenvalue (namely 0) and only one eigenvector (namely 
(1,0,0,0) and its scalar multiples), whereas the eigenspaces of a diago- 
nalizable 4 x 4 matrix have total dimension 4. Similar statements apply of 
course to the obvious generalizations of this 4 x 4 matrix to n x n for every 
positive integer n. 


Granted that not every matrixcan be diagonalized, what's the next best 
thing that can be done? The matrix 


3.1 0 
03 0 
00 4 


is not so easy to work with as 


but it's not too bad: its eigenvalues (and their multiplicities) can be read 
off at a glance, and even its powers are easy to compute. (For instance 


3 1i f3* ng 

0 3/ V0 aJ 
Check?) It is tempting to guess that the next best thing to diagonalize is to 
triangularize. Can that always be done? 


The characteristic property of a matrix in triangular form, for example 
a 4 x 4 matrix such as 


oor 
ox * 
* * * 
* * * * 


0 0 0 


is that there exists a basis consisting of vectors u, v, w, etc., (in the example 

they are (1,0, 0,0), (0, 1,0, 0), (0, 1,0, 0) etc.) such that 

(1) Au is a scalar multiple of u (or, in plain English, such that u is an 
eigenvalue of A), 

(2) Av isa linear combination of u and v, 
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(3) Aw is a linear combination of u, v, and w—etc. 

In view of this comment a natural approach to trying to triangularize a 
matrix A is (1) to find an eigenvector u, (2) to find a vector v such that Av 
is a linear combination of u and v, etc., etc. The answer to the question is 
yes: every matrix can be triangularized. The proposed proof is by induction 
on the size n of the matrix. The beginning, n — 1, is easy enough: if n — 1, 
there is nothing to do. 

For an arbtirary n, the first step in any event is always possible: every 
linear transformation on a complex vector space V has an eigenvector. 

The induction step has to be preceded by the observation that if M is 
the 1-dimensional space of all multiples of an eigenvector—an eigenspace 
—then the quotient space V/M has dimension n — 1 (Problem 52). Recall 
now that according to one definition of quotient space the elements of 
V/M are the vectors of V but with equality defined as congruence modulo 
M (Problem 51). Use that definition to define a linear transformation Ay 
on V/M (called the quotient transformation induced by A) by writing 


Amz = Ax 


for every < in V. This definition needs defense: it must be checked that it 
is unambiguous. The trouble is (could be) that two “equal” vectors (that 
is vectors that are congruent modulo M) might have unequal images. The 
defense, in other words, must prove that if z = y mod M, then Ax = 
Ay mod M, or, equivalently, that if x — y is in M, then A(x — y) isin M. In 
that form the implication is obvious—it asserts no more and no less than 
that M is invariant under A. 
The ground is now prepared for the induction step: since 


dim V/M — n — 1, 


the transformation Ay on V/M can be triangularized. That means, as a 
first step, that there exist vectors v, w, ... in V (considered here as a pho- 
tograph of V/M) such that Amv is a scalar multiple of v, Auw is a linear 
combination of v and w, etc. In different language: Av is equal to a scalar 
multiple of v plus an element of M, Aw is equal to a linear combination of 
v and v plus an element of M, etc. Conclusion: not only is Au a scalar mul- 
tiple of u, but also Av is a linear combination of u and v, and Aw is a linear 
combination of u, v, and w, etc.—and that says exactly that A has been 
triangularized. Conclusion: every transformation can be triangularized. 

To fix in one's mind this outline of an argument, it might be a good 
idea to follow it in a couple of concrete numerical cases, and that's what 
the following problem suggests. 
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Problem 108. Triangularize both the matrices 
1 1 0 1 1 0 
A—-|-1 2 1 and B={-4 5 0 
3 3 
Are they similar? 


Comment. To “triangularize” a matrix M itis not enough to exhibit a tri- 
angular matrix Mo and to prove that M and Mo are similar. What is wanted 
is either the explicit determination of a new basis with respect to which the 
new matrix of the same linear transformation is triangular, or, equivalently, 
the explicit determination of an invertible matrix T, the transformer, such 
that TMT-! is triangular. In the course of looking for a suitable basis the 
eigenvalues of M should become visible—which is frequently preceded by 
the determination of the characteristic polynomial and followed by the de- 
termination of the eigenvectors. 


Reminder. The theories of upper and lower triangularization are boringly 
alike; in the discussion above, for no especially good reason, upper was 
emphasized. 


109. Complexification 


Does every linear transformation on R” have an invariant subspace of di- 
mension equal to 1? To ask that is the same as asking whether every linear 
transformation on R” has an eigenvector, and the answer to that is obvi- 
ously no (see Problem 103). What happens if the question is liberalized a 
little? 


Problem 109. Jf n > 1, does every linear transformation on R” 
have an invariant subspace of dimension equal to 2? 


110. Unipotent transformations 


Problem 110. Jfa linear transformation A on a finite-dimensional 
(complex) vector space is such that A* — 1 for some positive integer 
k, must A be diagonalizable? 


Comment. ‘Transformations some positive power of which is equal to the 
identity are sometimes called unipotent. 
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111. Nilpotence 


Transformations with many distinct eigenvalues are diagonalizable (Prob- 
lem 106); does that imply that if a transformation has only a small number 
of eigenvalues, then it is difficult to diagonalize? A clue to the answer can 
be found in the triangularization discussions of transformations with just 
one eigenvalue (Problem 105). 

In the study of linear transformations with just one eigenvalue, the 
actual numerical value of that eigenvalue can’t matter much: if A has the 
unique eigenvalue a, then A — al has the unique eigenvalue 0, and 
(A — al) + BI has the unique eigenvalue 6. Since the addition of a scalar 
cannot produce any major changes, the question might as well be restricted 
to the easiest eigenvalue to work with, namely 0. 

How easy is it to find examples of linear transformations whose only 
eigenvalue is 0? One example is the ubiquitous 


(13): 
A- E as 


Is the latter obvious? The statement is surely not a deep one—a few sec- 
onds' calculation shows that the characteristic polynomial of A is A?—but 
a special point of view on it is usefully generalizable. To wit: since an equa- 
tion such as 


Another is 


Az = Ar 
implies that 
A’z = Mz, 


it follows from A? = 0 (afew microseconds’ calculation) that À = 0 (except 
in the degenerate case x = 0). Generalization: if a linear transformation 
A is such that A? = 0 for some positive integer q, then its only eigenvalue 
is 0. The proof is the same as the one just seen: if 


Az = Az, 
then 
AIr = Mz, 


and therefore A = 0 (or z = 0). 
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A linear transformation A such that A? = 0 for some positive integer q 
iscalled nilpotent; the smallest q that works is called its index of nilpotence. 
The observation of the preceding paragraph was that if A is nilpotent, then 
spec A consists of 0 alone. The converse is also true, but it is slightly deeper. 
Indeed: if spec A = {0}, then a triangularization of A (see Problem 105) 
has zeroes on as well as below the main diagonal, and a triangular matrix 
like that is nilpotent. The reason is that if 


O * * * x 
0 0 x x x 
A-|000 s» xj, 
0000 * 
00000 
is squared, then the result has the form , 
0 0 * * x 
0 00 * x 
A)4-[0000 « 
00000 
00000 


Emphasis: the diagonal just above the main one consists of zeroes only. 
Multiply by A again: for A? the two diagonals just above the main one 
consist of zeroes only. Continue this way, and infer that A" — 0. Conclu- 
sion: A is nilpotent (but a calculation of this sort does not reveal its exact 
index of nilpotence). 


Problem 111. Jfa linear transformation A on a finite-dimensional 
vector space is nilpotent of index q, and if, for each vector x in the 
space, a subspace M(x) is defined as the span of the vectors 


z, Az, A?^2,... , At 1g, 


how large can the dimension of M(x) be? 


112. Nilpotent products 


An obvious exercise about nilpotence is to ask whether the product of two 
nilpotent transformations is necessarily nilpotent. The answer is yes if they 
commute, but the answer is no in general; a standard easy example is given 


by the two matrices 
0 1 0 0 
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That doesn’t settle everything though; there are tricky nilpotence questions 
that arise in some contexts and to which the answer is not predictable just 
from the existence of examples such as these. 


Problem 112. Jf A and B are linear transformations on the same 
vector space such that ABAB = 0, does it follow that BABA = 0? 


113. Nilpotent direct sums 


If A, B, and C are nilpotent matrices of sizes 6, 4, and 4 respectively and 
indices of nilpotence also 6, 4, and 4 respectively, is the matrix 


A 0 0 
M=,{0 B 0 
00 C 


also nilpotent? That’s a trivial question; the 14 x 14 matrix M is obviously 
nilpotent of index 6. 

How else can one obtain a nilpotent matrix of size 14 and index 6? One 
easy answer is just to juggle the numbers: replace the sizes and indices of 
B and C, for instance, by 5 and 3, or replace the sizes and indices of B and 
C by 6 and 2, etc. These examples are direct sums; they are obtained by 
gluing together examples of the same or smaller size. What other way of 
manufacturing nilpotent matrices is there? 

The result of Problem 111 implies that if M is a nilpotent matrix of 
index 3, say, then there exists a vector x such that the vectors 


z, Mz, M?x 


are linearly independent. Extend that linearly independent set to a basis 
and write down the matrix of M with respect to that basis. The result looks 


like 
A X 
0 B?’ 


010 
A-[00 1], 
000 


and B is a matrix that must be nilpotent also, with index less than or equal 
to 3. Question: can X be thrown away? Precisely: is 


(s 5) 


where 
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A 0 " 
0 B 
That question in its general form is one of the most important ones 
in linear algebra and its answer is correspondingly difficult. It isn't all that 
difficult—the methods used so far serve to prove that the answer is yes— 


but it tends to be longish and complicated. A slight feeling for the spirit of 


the answer can be obtained by working out a very easy special case; here 
is one. 


similar to 


Problem 113. What isa basis with respect to which the linear trans- 
formation defined by the matrix 


0101 0 ; 
00 10 -1 
M—-|0000 0 
0000 1 
0000 0 
has the matrix 
01000 
00100 
M=|0 00 0 0]? 
0000 1 
00000 
114. Jordan form 


What happens when the general theorem that exhibits a nilpotent matrix 
as a direct sum (Problem 108) is applied repeatedly? The theorem says 
that with respect to a suitable basis every nilpotent matrix of index q, say, 


has the form 
A 0 
M-(5 p) 


where A and B have the following special properties. 
(1) Aisaq x q matrix of the same form (Jordan form) as the 3 x 3 matrix 


0 10 
001 
00 0 


described in Problem 108, meaning that the entries on the diagonal 
just above the main one are 1 and all others are 0, and 
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(2) B is nilpotent with index less than or equal to q. 


To apply that result the second time means to apply it to B. The result 
is a representation of B as a direct sum of two nilpotent matrices, of which 
the first is in Jordan form, with index equal to the index of B (and with 
size same as its index). Application of the method “repeatedly” as often as 
possible yields a matrix representation for M of the form 


A 0 0 
0 A 0 
0 0 As 


Here each A; on the diagonal is a nilpotent matrix in Jordan form, of size 
and index q;, with q; 2 q2 2 q3 Z --- . This is called the Jordan form of 
M. 

Could it be true that every matrix can be obtained by gluing together 
easy ones? Could it, for instance, be true that the “zero part" of every ma- 
trix can be split off and studied separately? What might that mean? Well, 
a possible hope is that every matrix is (or is similar to?) a direct sum, such 


as 
0 0 
0 3/' 


of a zero matrix and a non-zero matrix—but that is not true. Example: 


0 3 
G i) f 
The weakest way of saying “zero” and the strongest way of saying “non- 
zero” suggest a modified hope: could it be that every matrix is (or is similar 
to) a direct sum of a nilpotent matrix and an invertible matrix? Yes; that is 
true, and the result is known as Fitting’s lemma. 

The invertible direct summand that Fitting’s lemma yields (call it M) 
does not have 0 in its spectrum, but since it does have some eigenvalue 
^, Fitting's lemma is applicable to the matrix M — A. Consequence: M is 
representable as a direct sum that exactly resembles the one displayed in 
the nilpotent case, but with direct summands A; 


à 10 
0 A 1 
00 A 
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on whose main diagonal A appears instead of 0. In an obvious extension of 
the terminology introduced before, that form is known as the Jordan form 
of A. 

Arguing the same way separately for each eigenvalue of an arbitrary 
matrix M leads to the grand conclusion—the assertion that every matrix 
is similar to a direct sum of matrices M,, M2, Ms, ... in Jordan form, with 
distinct eigenvalues. That direct sum is called the Jordan form of M (in 
the second and final broadening of that expression), and the possibility of 
representing every M that way is the apex of linear algebra. It is difficult 
to think of an answerable question about linear transformations whose an- 
swer is not a consequence of representability in Jordan form; here is a small 
but pleasant sample. 


Problem 114. Does every matrix have a square root? 


Comment. If A = B?, then, of course, B is called a square root of A. 


115. Minimal polynomials 


Is every matrix algebraic? The language is borrowed from the theory of 
algebras: an element a of an algebra is called algebraic if there exists a 
non-zero polynomial p such that p(a) — 0. Example: if E is a projection, 
and if p(A) = X? — A, then p( E) = 0. Another example: if A is nilpotent of 
index q and if p(4) = A, then p(A) = 0. 

The second example can be generalized: if a linear transformation Ao 
on a finite-dimensional vector space has only one eigenvalue, ào say, then 
Ao — Aol is nilpotent of index go say, and therefore the polynomial 


mo(A) = (A — Ao)” 


annihilates Ao. Important note: go is the smallest degree that such a poly- 
nomial can have, and mo(A) is the unique monic polynomial of that degree 
that does the job. 

If a linear transformation has not just one eigenvalue but two, A and 
àz, then its Jordan form looks like 


M, O 
xs ( 0 M; ) : 
where M; — ^, and M; — A; are nilpotent, with some indexes q; and q». It 


follows that there is one and only one monic polynomial of minimal degree 
that annihilates M, namely the polynomial 


m(A) = (A — M) (A — Aa)”. 
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The number 2 has nothing to do with this statement or its proof. The 
general statement is that for every matrix A there exists a unique 
monic polynomial of minimal degree that annihilates A; it is called the 
minimal polynomial of A. This minimal polynomial is, in fact, equal to 
the product of the factors of the form (A — ;)% obtained by letting the 
A;’s range through the distinct eigenvalues of A; the q;'s are corresponding 
indexes in the Jordan (or triangular) form. 

Since each factor of the minimal polynomial is a factor of the charac- 
teristic polynomial also, it follows that if the characteristic polynomial of 
A is p, then p(A) — 0; this famous statement is known as the Hamilton- 
Cayley equation. 

If the minimal polynomial of a linear transformation on a space of 
dimension n has degree n, does it follow that the transformation is diago- 


nalizable? Answer: no—trivially no—a counterexample is ook 
How do the minimal polynomial and the characteristic polynomial of 
a diagonal matrix compare? Answer: if 


à 0 0 


0 X 0 
A=10 0 2 


then the characteristic polynomial is the product of all the (A — À; Y's, but 
the minimal polynomial is the product of just one representative of each 
possible factor. So, for example, if 


1 0 


ooo O 
on Oo oc 
noc Oo 

noooco 


1 
0 
0 
000 
then the characteristic polynomial is (A — 1)?(A — 2)?, and the minimal 
polynomial is (A — 1)(A — 2). 

These examples were trying to cultivate friendship toward minimal 
polynomials; the following problem might test the success of the attempt. 


Problem 115. What are the minimal polynomial and the charac- 
teristic polynomial of the differentiation transformation D and the 
translation transformation T on the space P} of polynomials 
of degree less than or equal to 3? (Reminder: Da(t) = d and 
Tz(t) = z(t + 1).) 
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116. Non-commutative Lagrange interpolation 


Is there a polynomial p such that p(n) = 10” for every integer n between 
1 and 100 inclusive? Sure, why not: in fact there is a polynomial of degree 
99 that does that. All you have to do to prove it is to write down the perti- 
nent system of 100 (linear) equations in the 100 unknown coefficients, and 
note that its determinant (a special instance of a Vandermonde) is not 0. 
It is easy to formulate a general theorem of which this result is a special 
case: if z1,...,2p, are n distinct numbers (the avoidance of repetitions is 
essential), and if y;,..., yn are any n numbers, then there exists a (unique) 
polynomial p of degree n — 1 such that 


p(z;) = y; 
for j — 1,...,n. The celebrated Lagrange interpolation formula is an ex- 
plicit presentation of that polynomial. The “numbers” in the general the- 
orem can be replaced by elements in an arbitrary field, and the result is a 
polynomial with coefficients in that field. 

Once all that is granted, a shallow generalization is easy to come by: 
if X1,..., Xn are pairwise disjoint finite sets of numbers (or elements of 
an arbitrary field), and if p1,...,p, are arbitrary polynomials (with coeffi- 
cients in the same field), then there exists a polynomial p such that 


p(z) = p;(z) 


whenever z € X;, j = 1,...,n. Proof: apply the Lagrange interpolation 
theorem to the set of (distinct) numbers z in X4 U---U X, with the cor- 
responding values y chosen to be p;(x) whenever z € X;. (The smallest 
possible degree of p is easy to describe in terms of the number n and the 
sizes of the sets X,..., Xn.) A statement of this generalization in terms 
of matrices goes like this: if Mı, . . . , M, are diagonal matrices with pair- 
wise disjoint spectra (that is, pairwise disjoint sets of eigenvalues, or, what 
comes to exactly the same thing, pairwise disjoint sets of diagonal entries), 
and if p1,..., p, are polynomials, then there exists a polynomial p such that 


p(M) = »;(M;) 


forj =1,...,n. 

The most conspicuous fact about diagonal matrices is that they all 
commute with one another; the matrix Lagrange interpolation theorem 
that was just formulated belongs to commutative linear algebra. Does its 
straightforward non-commutative generalization have a chance of being 
true, or does the conclusion itself imply some kind of partial commuta- 
tivity? To avoid extraneous trouble with non-existence of eigenvalues, the 
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straightforward generalization will be formulated over the field C of com- 
plex numbers. 


Problem 116. Js it always true that if Ai,..., An are linear trans- 
formations with pairwise disjoint spectra (that is, pairwise disjoint 
sets of eigenvalues) on a finite-dimensional vector space V over C, 
and if p,..., p, are polynomials with coefficients in C, then there 
exists a polynomial p with coefficients in C such that 


p(A;) = p;(A;) 


forj=1,...,n? 


CHAPTER $8 


INNER PRODUCT SPACES 


117. Inner products 


Which of these vectors in R? is larger: (2,3,5) or (3,4, 4)? Does the ques- 
tion make sense? The only sizes, the only numbers, that have been con- 
sidered so far are dimensions. Since (2, 3, 5) belongs to R? and (1, 1, 1, 1) 
belongs to R$, the latter is in some sense larger, but it is a weak sense and 
not a useful one. The time has come to look at the classical and useful way 
of *measuring" vectors. 

The central concept is that of an inner product in a real or complex 
vector space. That is, by definition, a 

(1) Hermitian symmetric, 

(2) conjugate bilinear, 
and 

(3) positive definite 
form—which means that it is a numerically valued function of ordered 
pairs of vectors x and y such that 


(z, y) = (y, 2), (1) 
(a121 + a222, y) = a(zi,y) + a2(22, y), Q) 
(z,2) 2 0; (a, x) = 0 if and only if z = 0. (3) 


Standard examples: for z = (£1, £5) and y = (m , 72) in R2, write 


(z,y) = é + £27)» 
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(reminiscent of the formula for the cosine of the angle between two segments), 
and for x and y in IP, write 


(2,9) = f s(t y) dt 


(a formal “continuous” analog of the sum in R? and of its natural generaliza- 
tion in R”). 

The upper bars here denote complex conjugation; the reason they are 
necessary has to do with the associated notion of length. The point is that the 
length (or norm) of a vector x is defined by 


ll] = V(@, 2). 


If the formula £173 + £272 had been used (instead of £17), T6275). then for 
a vector z in C? the consideration of its scalar multiple iz (where i = /—1) 
would lead to an unpleasant surprise. The relation between inner products 
and scalars would yield 


liz||? = (im, iz) = i(z, ix) = ?(z,2) = — ell"; 


and that could be regarded as unpleasant. The square of a length shouldn't 
really be negative—that would lead to a length whose value is an imaginary 
number, and that is not the sort of thing one normally thinks of as a suitable 
measure of size. 

An inner product space is a vector space with an inner product. The 
intuitive interpretation of (x,y) is the cosine of the angle between x and y, 
and, correspondingly, if (x, y) = 0—cosine equal to 0—the vectors x and y 
are called orthogonal (= perpendicular). To what extent is this metric concept 
in harmony with the linear concepts treated so far? 


Problem 117. How large does a finite orthogonal set of non-zero 
vectors have to be to be linearly dependent? 


Comment. A set of vectors is called orthogonal if each pair of its elements is 
orthogonal. Recall that when a set of vectors is enlarged, a linear dependence 
relation between them becomes more likely than it was before. 


118. Polarization 
The norm in an inner product space is defined in terms of the inner product. 


Is there any hope of going in the other direction? 


Problem 118. Can two different inner products yield the same 
norm? 
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119, The Pythagorean theorem 


The Pythagorean theorem says that the sum of the squares of two sides of 
a right triangle is equal to the square of the hypotenuse. Is anything like 
that true for vector spaces in general? 


Problem 119. Under what conditions on two vectors x and y is it 
true that 


læ + yll? = lll? + llyll?? 


120. The parallelogram law 


Problem 120. Under what conditions on two vectors x and y is it 
true that 


le + yll? + lla — yl? = 2lzIP + 2llyl?? 


Comment. The equation is not as strange as at first it might appear. Think 
about pictures: if x and y are two intersecting sides of a parallelogram, then 
x+y and z — y can be thought of as its two diagonals. The “parallelogram 
law” of elementary geometry is exactly the equation under consideration. 


121. Complete orthonormal sets 


How large can an orthogonal set be? One possible interpretation of that 
question is this: for which values of n is it possible to find an orthogonal 
set (z1,...,2,) of vectors in an inner product space? That's not quite a 
sensible interpretation: the notation allows many (all) of the z;'s to be 0, 
and in that sense n can be chosen arbitrarily large. An efficient way to rule 
out that uninformative interpretation of the question is to ^normalize" the 
vectors that are allowed to enter. In the language that is customary in this 
circle of ideas, to say that a vector z is normal or normalized means that 
z| = 1, and, correspondingly, an orthogonal set (71,22, ...) is called 
orthonormal if (z;,z;) = 6j; for all i and j. 

An orthonormal set is called complete if it is maximal, that is, if it can- 
not be enlarged, or, in other words, if it is not a subset of any larger or- 
thonormal set. Since orthonormal sets are linearly independent (Problem 
117), an inner product space of dimension n cannot have orthonormal sets 
with more than n elements. Can it always have that many? 
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Problem 121. Does every inner product space of dimension n have 
an orthonormal set of n elements? 


122. Schwarz inequality 


In what way are orthonormal bases better than just plain bases? A partial 
answer is that when a vector z is expanded in terms of an orthonormal 
basis the coefficients give precise information about the size of the vector. 
If, in fact (21,..., £7} is an orthonormal set (not even necessarily a basis), 
and z is an arbitrary vector, then 


2 
OS 


T 
x X (s, 2,)2; 
1 


= ( — 3. T)m;r— Dian) 
1 1 
T (z, x) = De, zi)(zi, 2) P X (e;,2)(2, 25) 


J 


+ > bAC z;)(r,25)(zi £3) 
i Jj 


= lel? — 5 læs) — 5 er)? + 9 Mem)? 


Consequence: 


dole, 24)? < lael’. 
i 


This result is known as Bessel’s inequality. It has two important conse- 
quences. 
(1) If x and y are vectors in an inner product space, then 


Iz, y)l S Hlzll - Iyl. 


This result is known as the Schwarz inequality. It can be derived from 


Bessel’s inequality as follows: if y = 0, both sides are 0; if y # 0, the 


set consisting of the vector M. only is orthonormal, and consequently, by 


(lel 


(s. 4) | < lel. 


Bessel's inequality, 
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(2) If x is a vector and {z1,. . . , £n } is an orthonormal basis in an inner 
product space, then 


lzl? = Y 7. nf. 
i 
To prove that, observe that if x is expanded in terms of the z;'s, 
t= o oj; 
j 


then, forming the inner product of each side with itself yields 


|||? = 307793077 = X Y amti 23) = X lal, 
i j i oJ k 


and forming the inner product of both sides with each x, yields 


(x, xR) = Ok. 


The equation 
lx? = S ^K 2)? 
i 


is known as Parseval's identity. Note: a small modification of the technique 
proves a more general result: if z = 5^; ajz; and y = 55; ajx;, then 


(z,y) = 2 josfl;. 


Bessel and Schwarz and Parseval are part of the standard lore of this 
subject. The answer to the next question is equally well known to the ex- 
perts, but it is slightly more recondite and, perhaps, a little more fun. 


Problem 122. For which pairs of vectors does the Schwarz inequal- 
ity become an equation? 


123. Orthogonal complements 


Just how much is orthogonality in *harmony" with linearity? What is al- 
ready known is that vectors that differ a lot in the metric sense differ a 
lot in the linear sense too (orthonormal sets are linearly independent, see 
Problem 117). What if a bunch of vectors all have a common (orthogonal) 
enemy—does it follow that they are all (linear) friends? A sharp formu- 
lation of that vague question has to do with what are called orthogonal 
complements. 
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If E is a set of vectors in an inner product space V, the orthogonal 
complement E+ (pronounced “E perp”) of E is the set of all those vectors 
in V that are orthogonal to every vector in E. It is an easy exercise to verify 
that E+ is a subspace of V (it doesn’t matter whether E is a subspace or 
not), and saying the meanings of the symbols slowly ought to convince any- 
one that E c E^. (Is the intended meaning of E^ clear? It is (E+)+.) 
Consequence: 


span E C E-4. 


Problem 123.  /fMisa subspace ofa finite-dimensional inner prod- 
uct space V, what are the relations among M, M+, and M++? 


124. More linear functionals 


Is the resemblance between the superscripts such as in M? (annihilators of 
subspaces) and the ones in M+ (orthogonal complements of subspaces) a 
structural one or merely notational? It turns out that the question is really 
one about linear functionals. 

Linear functionals on an inner product space V are easy enough to 
come by: fix an element y in V and then define a function £ on V by writing 


E(x) = (z, y) 


for all x. That € is a linear functional is an immediate consequence of the 
defining properties of inner products. Are the linear functionals obtained 
in this way typical? 


Problem 124.  /f € is a linear functional on an inner product space 
V, does there always exist a vector y in V such that 


&(z) = (x,y) 


for all x? 


125. Adjoints on inner product spaces 


Is a vector in an inner product space the same as a linear functional? That 
may look like a foolish question, but (a) linear algebraists are used to con- 
sidering linear functionals as vectors (elements of the dual space), and (b) 
in an inner product space each vector induces (is?) a linear functional, the 
one defined by inner products. In fact the correspondence that assigns to 
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each vector y in an inner product space V the linear functional € that it 
induces, 


&(z) = (z,y), 


is a one-to-one correspondence between all of V and all of the dual space 
V'. One-to-one? Sure: if y, and y» correspond to the same v, then 


(x, 41) = (2.92) 
for all x, so that 


(2,31 — ya) =0 


for all z, so that y; — ye is orthogonal to every vector x, and therefore 
yı — ye = 0. All of Y’? Sure: that's what Solution 123 proves. 

The correspondence y — z is eager to cooperate with the linear struc- 
ture of V. If y, — & and yo — £s, then yı + ye — £1 + £—that's 
easy—and if y induces €, then a scalar multiple ay induces—no, not o£, 
but almost—in fact it induces a£. Clear? If (x,y) = £(x) for all z, then 
(z,oy) = a(z,y) = a£(z) for all x. The correspondence y + £ doesn't 
quite deserve to be called an isomorphism: it is a conjugate isomorphism. 

Is the dual space of an inner product space an inner product space? 
Well, that depends: how is the inner product of two linear functionals £i 
and £2 defined? If £&i(z) = (x, yı) and £2(z) = (x, ya), the most natural 
looking definition is probably this: 


(61,82) = (11: Y2). 
Trouble: it doesn't work. Is (£1, £2) linear in €? Additivity is all right, but 
is it true that 
(a£1,&2) = a(&, 62)? 


No: since ay; induces @£), so that o£, is induced by ayi, 


(041, £2) = (agi, y2) = (y1, ye) = A(61, €2), 


it follows that, once more, a conjugation appears where it wasn’t invited. 

There is a brute force remedy, but it’s far from clear on first glance 
that it will work: why not define (£1, £2) to be (y2, y1)? Does it work? Yes. 
Indeed, with that definition, 


(a£1,62) = (yz; 031) = £(yz. y1) = a(€1, £2). 


Conclusion: with the inner product so defined, the space of all linear func- 
tionals on an inner product space V is itself an inner product space, and, as 
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such, it is denoted by V*. The isomorphism statement for the pure vector 
spaces V and V’ now extends to the inner product spaces V and V*: they 
too are conjugate isomorphic. 

If in accordance with that conjugate isomorphism the spaces V and V* 
are identified —regarded as the same—then many earlier statements about 
the relation between V and V' become more interesting and more usable. 
So, for instance, the assertion that corresponding to each basis of V there 
is a dual basis in Y’ becomes the assertion that to each basis (1,..., 24) 
of V there corresponds another basis (£i, . - - , Ên ) of V, the dual basis, such 
that (z;, €z) = 6;;. The correspondence between subspaces of V and their 
annihilators in V' becomes the correspondence between subspaces M of V 
and their orthogonal complements M+ also in V. Finally, and most impor- 
tantly, the correspondence between linear transformations on V and their 
adjoints on V' becomes a correspondence between linear transformations 
A on V and their adjoints on V*, denoted in this context by A*. The purely 
linear adjoints and the inner product kind differ in minor ways only, all of 
which have to do with the conjugation that has to be built into the complex 
theory. The differences are that 


(o.A)* = @A* (not aA*), (1) 


that the matrix of A* (with respect to an orthonormal basis) is the 
conjugate transpose of the matrix of A, 


(aij) becomes (@;;) (not(o;;)), (2) 
and that 
det A* =detA (not det A). (3) 


That's the bad news—and it sure isn't very bad. The good news is that if A 
is a linear transformation, not only are A and A** comparable but so are 
A and A*. For A and A** “comparable” turns out to mean “equal”. For 
A and A* that can happen, but doesn’t have to, but, in any case, new and 
valuable questions can be asked. When is it true that A* — A? How about 
A* = —A? What can be said about the sum of A and A*? What about the 
product; when do A and A* commute? These questions are at the basis of 
the most important part of linear algebra. Before beginning their proper 
study, a couple of problems should be looked at by way of practice—the 
properties of the correspondence A — A* take a little getting used to. 


Problem 125. The direct sum V ® W of two inner product spaces 
is defined to be the direct sum of the vector spaces V and W endowed 
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with the inner product defined by 


((a1, 41), (22; y2)) = (1,22) + (V1, Y2). 


(Check: is this indeed an inner product?) 
(a) If a linear transformation U is defined on V @ V by 


U(z, y) = (y, =z), 


what is U*? What are U*U and UU*? 

(b) The graph of a linear transformation A on a vector space V 
is the set of all those ordered pairs (x,y) in V ® V for which y = Ax. 
Is the graph always a subspace of V ® V? 

(c) If G is the graph of a linear transformation A on V, what is 
the graph of A*? How are those graphs related? 


126. Quadratic forms 


Adjoints enter linear algebra through still another door, the back door of 
quadratic forms (whose name, to be sure, doesn't sound very linear). A 
quadratic form is a specialization of a bilinear form, which, in turn is a 
generalization of a linear functional. 

The obvious way to generalize linear functionals is in the direction of 
the functions of several variables called multilinear forms. The easiest but 
nevertheless typical multilinear forms are the bilinear ones; they are, by 
definition, functions of two vector variables that are linear in each variable 
separately for fixed values of the other. Explicitly: if V is a vector space, a 
scalar-valued function y on V @ V is a bilinear form if 


plaiz + 0325, y) = one (xi, y) + A29(%2, y) 


for each y in V, and at the same time 


p(x, Biyi + Boy) = Pip(z, yi) + Bs. y2) 
for each x in V. Example: if V is R! and 


P(x, y) = ry, 
then o is a bilinear form. Less trivially: if V is R? and 


p( (£1, 22), (yi, y2)) = zy + Layo, 


then ¢ is a bilinear form. (Check?) A quadratic form is a function obtained 
from a bilinear one by restriction to equal values of the two variables. That 
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is: the quadratic form induced by the bilinear form ¢ is the function o 
defined by 


P(x) = p(x, x). 


The most valuable example of a bilinear form is one that is not really 
an example at all. If V is an inner product space and if ọ is defined by 


p(x, y) = (z,y), 


then y is linear in z, to be sure, and it's trying to be linear in y, but complex 
conjugation ruins it. A curious usage of words has been adopted in con- 
nection with this kind of occurrence of complex conjugation. A complex- 
valued function £ that is additive, 


E(x +y) = E(x) + €(y), 


and fails to be linear because of complex conjugation, 


Elaz) = a£(z) 


has come to be called semilinear. (Happy acceptance of this language 
might be difficult for some—it’s not obvious that such a function satisfies 
exactly half the conditions for linearity—but it is well established and it’s 
too late to change it.) In accordance with that usage a function q of two 
variables that behaves like an inner product (linear in the first variable and 
semilinear in the second) could be called one-and-a-half linear—and that 
is almost what it is called. In fact the Latin for one-and-a-half is used, so 
that the technical word is sesquilinear. (Semilinear functions are some- 
times called antilinear and sometimes conjugate linear.) 

Linear transformations in conspiracy with inner products can be used 
to get many examples of sesquilinear forms. Here is how: if A is a linear 
transformation on an inner product space V, and if z and y are in V, write 


p(z, y) = (Az, y). 


All such examples are sesquilinear and therefore they act, in part, like inner 
products—but usually only in part. There is no reason why they should be 
Hermitian symmetric—sometimes they are and sometimes they are not— 
it all depends on properties of A. There is no reason why they should 
be positive definite—again whether they are or not depends on A. It is 
good to know just how these properties depend on A, and that will be dis- 
cussed soon. For now, however, it’s best to get back to the general theory 
of sesquilinear forms. 
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If y is a sesquilinear form, what should ¢, be called? (Here, as be- 
fore, p(x) = y(x, x).) No accurate word exists (semiquadratic and sesqui- 
quadratic suggest themselves?—but the world hasn't adopted either one), 
and, therefore, an innacurate one is commonly used. If ¢ is a sesquilinear 
form and ¢(x) = y(x, x), then, just as in the bilinear case, ¢ is called the 
quadratic form associated with (induced by) ¢. 

There is a way of making new sesquilinear forms out of old. If q is 
a sesquilinear form, and if P and Q are linear transformations, then the 
expression 


e(Pz,Qy) 


defines another sesquilinear form—in this sense linear transformations 
(or, rather, pairs of linear transformations) act on sesquilinear forms. If, 
in particular, 


e(z, y) = (Az, y); 
then 


p(Pz, Qy) = (APz, Qy) = (Q" APz, y). 


That is: the action of P and Q on ¢ replaces the linear transformation A by 
an equivalent one (in the strict sense of the word, as discussed in Problem 
95). If 2 is the quadratic form associated with qv, then the natural way to 
mix in a linear transformation is to consider (Px). Since 


Q(Pz) = (APz, Px) = (P' APz, x), 


the action of P on ¢ replaces A by the unfamiliar construct P* AP. That 
construct is a good thing to know about; the concept it defines is called con- 
gruence. That is: two linear transformations A and B are called congruent 
if there exists an invertible linear transformation P such that 


B= P* AP. 


Invertibility is essential here; without it the relation is too loose to be of 
much interest. Congruence is a special case of equivalence. What are its 
special properties? 


Problem 126. (a) Is congruence an equivalence relation? 

(b) /f A and B are congruent, are A* and B* congruent? 

(c) Does there exist a linear transformation A such that A is con- 
gruent to a scalar a but A # a? 

(d) Do there exist linear transformations A and B such that A 
and B are congruent but A? and B? are not? 
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(e) Do there exist invertible linear transformations A and B such 
that A and B are congruent but A^! and B^! are not? 


127. Vanishing quadratic forms 


To what extent does a quadratic form (Az, x) determine the linear trans- 
formation A? Can it happen for two different transformations A and B 
that (Az, x) = (Bz, x) for all x? 


Problem 127. Does there exist a non-zero linear transformation A 

on an inner product space such that (Az, x) = 0 for a x? 
Comment. It is clear, isn't it?, that this question about zero is the same 
as the uniqueness question: is it true that (Az, 2) = (Bz, x) for all x if and 
only if ((A — B)z, x) = 0 for all x? 


128. Hermitian transformations 


How closely do linear transformations on a vector space resemble complex 
numbers? Linear transformations can be added and multiplied—that’s old 
stuff, and it says no more than that they form a ring. Transformations on 
an inner product space admit another operation, one that resembles com- 
plex conjugation (adjoint)—that is another, different aspect of the resem- 
blance. 

Complex conjugation can be used to define various important sets of 
complex numbers. The most obvious one among them is the set of real 
numbers (some complex numbers are real)—they can be defined as the set 
of those complex numbers z for which z = Z. The transformation analog 
is the set of those linear transformations A on an inner product space for 
which A = A*. They are called Hermitian, and they are among the ones 
that occur most frequently in the applications of linear algebra. 

If the matrix of a linear transformation A with respect to an orthonor- 
mal basis is (a;;), then the matrix of A* is the conjugate transpose (@;;) 
(see Problem 125). The use of orthonormal bases is crucial here. If, for 
instance, A has the matrix 

0 1 
(i o) 


with respect to the orthonormal basis {(1, 0), (0, 1)}, then A is Hermitian. 
If however the non-orthonormal basis (u;, u2} is used, where 


ui = (1,0) and u2 = (1, 1), 
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then 
Au, = (0,1) = (1,1) — (1,0) = u2 — ui, 
and 
Au, = U2, 


so that the matrix of A with respect to (u;, u2} is 


-1 0 
TE 
Since it is easy enough to write down as many conjugate symmetric ma- 
trices as anyone could desire, it is easy to produce examples of Hermitian 
transformations. Very special case: a scalar transformation is Hermitian if 
and only if the scalar in question is real. 
What follows is a sequence of problems (puzzles) intended to give 
their solver the opportunity of getting used to the properties of Hermitian 
transformations. 


Problem 128. When is the product of two Hermitian transforma- 
tions Hermitian? 


129. Skew transformations 


The most unreal numbers are the so-called pure imaginary ones, the real 
multiples of i. The complex conjugate of such a number is not equal to 


itself but is equal to its negative: ? = —i. The transformation analogs of 
those numbers are called skew Hermitian, or simply skew: they are the 
transformations A for which A* = —A. Various combinations of Hermi- 


tian transformations and skew transformations can sometimes turn out to 
be Hermitian or skew; here are some sample questions. 


Problem 129. (a) if A and B are congruent and A is skew, does it 
follow that B is skew? 
(b) If A is skew, does it follow that A? is skew? How about A?? 
(c) If A is either Hermitian or skew, and if B is either Hermitian 
or skew, what can be said about AB + BA? What about AB — BA? 


Comment. “Congruent” refers to the concept discussed in Problem 120. 
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130. Real Hermitian forms 


How much do Hermitian transformations resemble real numbers? The moti- 
vation for introducing them above was the equation A = A*, but that’s only 
a formal analogy. If A is the matrix 


0 1 
0 0 
— as non-Hermitian as any matrix can get—and if x = (24,23), then 


(Az,x) = 422 


—the quadratic form associated with A is as non-real as any can get. If, on 
the other hand, A is j 


HET 


(Az, x) = X429 + r,29 = ZRET x2 


then 


—as real as any can get. Are these phenomena typical? 


Problem 130. What is the relation between Hermitian transforma- 
tions and transformations with real quadratic form? 


131. Positive transformations 


Does the set of positive real numbers have as nice a transformation analog 
as the set of all real numbers? A natural attempt to define positiveness for 
transformations is to imitate the definition of reality via quadratic forms, 
and that in fact is what is usually done. A linear transformation A is called 
positive if (Az,x) 2 0 for every vector x; if (Az, x) > 0 for all non-zero 
x, the phrase positive definite is used. The symbolic way of saying that A is 
positive is to write 


A20 
and the statement A — B = 0 can also be written as 
AZ B. 


The weak sign (2) can be replaced by the strong one (>) when the facts 
permit it, and 
BSA 


means, of course, the same as A 2 B. 


INNER PRODUCT SPACES 143 


Examples: if A is an arbitrary linear transformation, then A*A 2 0 
(because (A* Az, x) = ||Az||?), and therefore if B is a Hermitian transfor- 
mation, then B? > 0 (because B? = B* B). These statements are trans- 
formation analogs of the numerical statements zz 2 0 (for every complex 
number z) and u? > 0 (for every real number u). 

To say that a matrix is positive means that the linear transformation it 
defines is positive; in matrix notation that is expressed by saying that 


> ogtt; 20 
ij 


for every vector {£),...,&,}. Concrete examples: 


2 1 
> 


2161 |? + bo + 665 + (fol? = là + €? + là, 


because 


and (easier) 


but 


(i) ra 


are not positive, in both cases because the values of their quadratic forms 
at the vector (—1, 1) are negative. 

Some caution is called for in using the symbolism of ordering when 
complex numbers enter the picture. Everybody agrees that 3 > 2, and 
almost everybody sooner or later agrees that —2 > —3. What about 


34i 524i 


—is that true? What is true is that subtracting the right side from the left 
yields a positive number, but, nevertheless, most people feel uncomfort- 
able with the inequality. Common sense suggests that such inequalities are 
best avoided, and experience shows that nothing is lost by avoiding them 
and using inequalities for real numbers (and Hermitian transformations) 
only. It is pertinent to recall that if A > 0, so that (Az, x) > 0 for all x, then 


in particular (Az, x) is real for all z, and therefore A must be Hermitian 
(Problem 130). 
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Problem 131. (a) Js there an example of a positive matrix not all of 
whose entries are positive? 

(b) Is there an example of a non-positive matrix all of whose 
entries are positive? 


11 1 

(c) Is the matrix | 1 1 1 | positive? 
111 
10 1 

(d) Is the matrix | 0 1 0 | positive? 
10 1 


(e) For which values of a is the matrix | 


= = Q 


11 
0 O | positive? 
0 0 


132. Positive inverses 


Problem 133. Jf a positive transformation is invertible, must its in- 
verse also be positive? 


133. Perpendicular projections 


The addition to a vector space of an inner product structure makes the the- 
ory more special (and therefore deeper); how does it affect the questions 
and answers about projections? The answer is that it affects those questions 
and answers quite a lot. The main reason for the change is that the inner 
product structure picks out a special one among the many complements 
that a subspace has, namely (obviously) the orthogonal complement. Re- 
call that if M and N are complementary subspaces of a finite-dimensional 
vector space V, that is if 


MNN={0} and M+N=V, 


so that every z in V is uniquely representable as 


z—rcty 


with z in M and y in N, then the projection E to M along N is the linear 
transformation defined by Ez = x (Problem 72). If V is an inner product 
space, then the projection onto a subspace M along its orthogonal comple- 
ment M+ is called the perpendicular projection onto M. When extraordi- 
nary caution is needed, that perpendicular projection can be denoted by 
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Pu, but most frequently, when the context makes notational and termino- 
logical fuss unnecessary, even the word “perpendicular” is dropped and 
people just speak of the projection onto M. 

It would be a pity if perpendicular projections were lost in the crowd 
of all possible projections. Is there a way of recognizing them? 


Problem 133. Which linear transformations on an inner product 
space are perpendicular projections? 


Comment. The question is vague—the first problem is to look for a non- 
vague interpretation of it. In slightly less vague terms the challenge is to 
look for an algebraic characterization of those linear transformations on 
an inner product space that are perpendicular projections. 


134. Projections on C x C 


Problem 134, What can the matrix of a projection on C x C (= C?) 
look like? 


Caution. “Matrix” here refers to a matrix with respect to an orthonormal 
basis. 


135. Projection order 


How is the geometric ordering of projections related to the algebraically 
defined ordering via positiveness (Problem 131)? The geometric ordering 
is one that suggests itself naturally: if E and F are projections with ranges 
M and N, then it is an almost irresisitible temptation to say that E is smaller 
than F in case M is smaller than N (meaning that M C N). 


Problem 135. Jf E and F are perpendicular projections, is there an 
implication in either direction between the statements 


E<F 
and 


ran E C ran F? 
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136. Orthogonal projections 


How is the orthogonality of two subspaces reflected by their projections? 


Problem 136. Zf E and F are perpendicular projections, what alge- 
braic relation between E and F characterizes the geometric property 
of ran E being orthogonal to ran F? 


Comment. Whatever the answer turns out to be, it seems reasonable to 
use for that algebraic relation the same word as for subspaces. That is, 
E and F shall be called orthogonal projections exactly in case ran E and 
ran F are orthogonal subspaces. 

| 
137. Hermitian eigenvalues 


How does the spectrum of a transformation reflect its structure? Partial 
answers to this vague question have occurred already, as for instance in the 
statement that the nilpotence of A is equivalent to spec A = {0} (Problem 
111). 

Another special sample question is this: can a real matrix (meaning 
just that all its entries are real) have non-real eigenvalues? Yes. Example: 
the eigenvalues of the matrix 

4 1 
(i) 


that is the roots of the quadratic equation 
A? — 6A + 25 — 0, 


are 3 + 4i and 3 — 4i. (Is a general construction visible here? Can every 
complex number be an eigenvalue of a real matrix?) 

The concept of a “real matrix” is an artificial one—reality is not a 
property of a linear transformation but of a conspiracy between a linear 
transformation and a basis. (Such conspiracies are usually called matrices.) 
What about the notion of reality that is a property of a linear transforma- 
tion —does that behave differently? 


Problem 137. What can be said about the eigenvalues of Hermitian 
transformations? What about positive transformations? 


Question. Are the conditions on the eigenvalues necessary, or sufficient, 
or both? 
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138. Distinct eigenvalues 


Here is a good question to ask that may not occur to everyone immedi- 
ately but that does play a role in the theory: is there any relation between 
eigenvectors belonging to different eigenvalues? Example: if 


12 
4=(5 4) 


then (1, 0) is an eigenvector with eigenvalue 1 and (1, 1) is an eigenvector 
with eigenvalue 3, and there is no obviously discoverable relation between 
those two eigenvectors. Why not? 


Problem 138. Zs there any relation between eigenvectors belonging 
to distinct eigenvalues of a Hermitian transformation? 


CHAPTER 9 


NORMALITY 


139. Unitary transformations 


The three most obvious pleasant relations that a linear transformation on 
an inner product space can have to its adjoint are that they are equal (Her- 
mitian), or that one is the negative of the other (skew), or that one is the 
inverse of the other (not yet discussed). The word that describes the last of 
these possibilities is unitary: that’s what a linear transformation U is called 
in case it is invertible and U^! = U*. The definition can be expressed in 
a “less prejudiced” way as U*U = 1—less prejudiced in the sense that it 
assumes less—but it is not clear that the less prejudiced way yields just as 
much. Does it? 


Problem 139.  /f U is a linear transformation such that U*U = 1, 
does it follow that U*U = 1? 


140. Unitary matrices 


It seems fair to apply the word “unitary” to a matrix in case the linear trans- 
formation it defines is a unitary one. (Caution: when language that makes 
sense in inner product spaces only is applied to matrices, the basis that es- 
tablishes the correspondence between matrices and linear transformations 
had better be an orthonormal one.) A quick glance usually suffices to tell 
whether or not a matrix is Hermitian; is there a way to tell by looking at a 
matrix whether or not it is unitary? The following special cases are a fair 
test of any proposed answer to the general question. 
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Problem 140. (a) For which values of ais ( 
trix? : 
(b) For which values of o is ( E 2) a unitary matrix? 
~ 2 


d d a unitary ma- 
i: d Held: 


(c) Is there a 3 x 3 unitary matrix whose first row is a multiple 
of (1, 1,1)? 


141. Unitary involutions 


The two simplest properties that a linear transformation on an inner prod- 
uct space can have are being Hermitian or being unitary. A pleasantly and 
interestingly related property is being involutory. (A linear transformation 
U is called an involution or involutory if U? = 1.) What are the relations 
among these properties? 


Problem 141. What are the implication relations among the con- 
ditions U* = U, U*U = 1, and U? = 1? 


142. Unitary triangles 


Problem 142. Which unitary matrices are triangular? 


143. Hermitian diagonalization 


Diagonal (or diagonalizable) matrices are pleasant to work with; it is al- 
ways good to discover of a class of matrices under study that they can be 
diagonalized. (Remember, for instance, the diagonalization of permuta- 
tions, Problem 103, and the diagonalization of transformations with dis- 
tinct eigenvalues, Problem 106.) 

Is every Hermitian transformation diagonalizable? Here is a phony 
proof that the anwer is yes. Given a Hermitian A, find a basis 


{e1, €2,...} 


such that the matrix of A with respect to that basis is upper triangular 
(Problem 108). If, to be specific, 


Ae; = Qjj€i, 
i 
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with o;; = 0 whenever i > j, then 
(Ae;, ex) = 2 ai; (ei, ex) = > Oi; 5ik = Qkj- 
i i 
The Hermitian character of A implies that 
ajk = (Aek, €j) = (ex; Aej) = (Aej, ex) = ej, 
and hence that aj, = 0 whenever j > k. Consequence: 
ajk =0 


whenever j Æ k, and therefore the matrix (a,;) is diagonal. 

What’s wrong with that proof? Answer: it uses the orthonormality of 
the basis (e, e2,...}, and that’s completely unjustified. All that the trian- 
gularization theorem says is that there exists some basis that does the job— 
it leaves open the question of whether or not there exists an orthonormal 
basis that does it. 

It’s easy enough to doctor up a basis so that with respect to it the matrix 
of some Hermitian transformation comes out triangular but not diagonal. 
For a concrete example, consider the linear transformation on C? whose 


matrix is 
2 -1 
a-(2 Y) 


which is of course seen to be Hermitian by a casual glance. Consider now 
the (non-orthonormal) basis 


{(1, 1), (0, 1)} 
of C?. Since 
A(1,1)=(1,1) and A(0,1)- (1,2), 
so that 
A(1,1) 2 1- (1,1) +0: (0, 1) 
and 
A(0, 1) = —1- (1,1) + 3- (0, 1), 
it follows that the matrix of A with respect to that basis is 
(0 3) 
0 3 


—triangular but not diagonal. 
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Which Hermitian matrices are triangular? The answer is “just the di- 
agonal ones”—that’s what the phony proof above really proves. The orig- 
inal question, however, still stands. 


Problem 143. Zf Ais a Hermitian transformation on a finite-dimen- 
sional complex inner product space, does there always exist an or- 
thonormal basis with respect to which the matrix of A is diagonal? 


Comment. To say that a linear transformation is diagonalizable means 
that its matrix A (with respect to an arbitrary basis) is similar to a diagonal 
matrix, and that conclusion can be expressed by saying that there exists an 
invertible matrix T such that T^! AT is diagonal (Problem 86). In the same 
way, the assertion that a linear transformation A is diagonalizable with re- 
spect to an orthonormal basis can be expressed by saying that there exists 
a unitary matrix U such that U* AU is diagonal. This assertion is an imme- 
diate consequence of its predecessor—all that has to be recalled is that a 
linear transformation that changes one orthonormal basis into another is 
necessarily unitary. The present question could therefore have been for- 
mulated this way: if A is a Hermitian matrix, does there always exist a uni- 
tary matrix U such that U* AU is diagonal? 


144. Square roots 


If A is a linear transformation, does e^ make sense? Or cos A? or V'A? 
The general question is what sense it makes to form functions of a trans- 
formation, and whether it does any good to do so. 

Yes, functions of transformations sometimes make sense and are 
sometimes very useful. The most typical and most important special case 
is the assertion that every invertible linear transformation (on a finite- 
dimensional complex vector space) has a square root (Problem 114). That 
is a matrix generalization of the statement that every complex number has 
a square root—a true statement that happens, however, not to be espe- 
cially useful. The useful fact is that every positive number has a positive 
square root. Is there a good matrix generalization of that? 


Problem 144. How many positive square roots can a positive linear 
transformation on a finite-dimensional inner product space have? 
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145. Polar decomposition 


Does it make sense to speak of the absolute value of a linear transforma- 
tion? An answer is suggested by the so-called polar representation 
a = pe? 

of a complex number. The angle (= real number) @ is between 0 and 27, 
and the number p is positive—and the latter is the absolute value of a. (It is 
worthy of note that except when a = 0 the polar representation is unique.) 
Can such a representation be imitated by linear transformations? 

What does imitation mean? A good imitation of a positive number is, 
presumably, a positive linear transformation. The equation 


ei8 . ei — 1 


suggests that a possible imitation of the angle part of the polar representa- 
tion is a linear transformation whose product with its own adjoint is equal 
to the identity transformation—that is, a unitary transformation. 


Problem 145. Which linear transformations A on a finite-dimen- 
sional inner product space are equal to products U P with U unitary 
and P positive? 


146. Normal transformations 


Can every matrix be diagonalized? Why not? After all every A is equal to 
B + iC, with B and C Hermitian; why isn't it true that the diagonaliza- 
tions of B and C separately yield a diagonalization of A? The answer is, of 
course, that diagonalization involved finding a suitable orthonormal basis, 
and there is no reason to expect that a basis that diagonalizes B will have 
the same effect on C. 

All right then—are Hermitian transformations the only ones that can 
be diagonalized? Nonsense—of course not—for an example just consider 
a diagonal matrix such as 

i 0 
(o 1) 


that has a non-real entry. Emphasis: diagonalization in these questions 
means orthonormal diagonalization, or, from a different but equivalent 
point of view, unitary equivalence to a diagonal matrix; see the comment 
following Problem 143. 

To discover the right middle course to steer between the extravagantly 
large class of all transformations and the relatively too restricted class of 
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Hermitian ones, the intelligent thing to do is to examine diagonalizable 
matrices and try to discover what makes them so. If D is a diagonal matrix, 
then so is D*, and, therefore, D and D* commute. That's a special property 
of diagonal matrices; does it survive under unitary equivalence? That is: if 
U is unitary and A — U* DU, isittrue that A and A* commute? Sure: 


AA* = U* DUU* D*U 
=U*DD*U (because U is unitary) 
= U*D* DU 
—U*D*UU*DU (becauseU is ünitary) 
zo. 


Linear transformations with the commutativity property here encoun- 
tered (A*A — AA*) are called normal, and while at this stage the con- 
nection between normality and diagonalizability is rather tenuous they de- 
serve a look. The best thing to look at might be a property of Hermitian 
transformations that played an important role in the proof that they are 
diagonalizable (see Solution 143)—do normal transformations have that 


property? 


Problem 146. Must eigenvectors belonging to distinct eigenvalues 
of a normal transformation (on a finite-dimensional inner product 
space) be orthogonal? 


147. Normal diagonalizability 


Are normal transformations good imitations of Hermitian ones? The use- 
ful Hermitian lemma that helped to prove that Hermitian transformations 
are diagonalizable extends to the normal case (that's what Problem 146 
did); does its consequence extend also? 


Problem 147. If A is a normal transformation on a finite-dimen- 
sional complex inner product space, does there always exist an or- 
thonormal basis with respect to which the matrix of A is diagonal? 


Comment. In less stuffy language the question is whether normal trans- 
formations are diagonalizable. 
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148. Normal commutativity 


The most important kind of questions linear algebra can ask and answer 
concern the relation between the algebra and the geometry of linear trans- 
formations. 

Here is an example: if two linear transformations A and B commute, 
and if A is an eigenvalue of A with eigenvector x (so that Ar = Az), then 
ABz = BAz = ABr. That is: the algebraic assumption of commutativity 
yields the geometric conclusion that each eigenspace of either transforma- 
tion is invariant under the other. Is the converse true: does the geometric 
statement imply the algebraic one? 

The answer is no. If, for instance, A and B are defined on C? by 


0 1 1 1 
4- (1 i) ge s-(i 3: 
then the only eigenspace of A is the set of all vectors of the form (a, 0), 


and that set is invariant under B, but A and B do not commute. 
Does the bad news become good if normality enters the picture? 


Problem 148. Zf the linear transformations A and B on an inner 
product space are such that every eigenspace of A is invariant under 
B, and if B is normal, does it follow that AB = BA? What if B is 
not necessarily normal, but A is? 


149. Adjoint commutativity 


If A, B, and C are linear transformations such that A commutes with B 
and B commutes with C, does it follow that A commutes with C? In other 
words: is the relation of commutativity transitive? The suggestion is seen 
to be absurd almost sooner than it can be made: if B — 0, the assumptions 
are satisfied, but there is no reason on earth for the conclusion to follow. 

The strongly negative nature of the answer adds interest to the study 
of special cases in an attempt to learn when the answer remains negative 
and when it just happens to be affirmative. As it turns out, moreover, some 
of those special cases are useful to know about, especially the ones that 
have to do with adjoints. 


Problem 149. Jf A, B, and C are linear transformations on an in- 
ner product space V such that A commutes with B and B commutes 
with C, and if two of the three are adjoints of one another, does it 
follow that A commutes with C? 
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150. Adjoint intertwining 


The standard way to describe the fact that AS = SA is, of course, to say 
that S commutes with A. An important generalization of commutativity 
also has a word associated with it, a less well-known word: if AS = SB, 
then S intertwines A and B. Commutativity is rare; intertwining is more 
common. When commutativity theorems can be extended to intertwining 
theorems, good applications usually follow. 

A good commutativity theorem is the one about adjoint commutativity 
(Solution 143): if A is normal and AS = SA, then A*S = S A*. (Caution: 
the assumption of normality is essential in this implication. The point is 
that, no matter what A is, S can always be taken to be A itself, and the 
commutativity assumption is satisfied; if, however, A is not normal, then 
S, that is A, will not commute with A*.) 

Does the adjoint commutativity theorem have an intertwining version? 
That is: is it sometimes possible to start with AS = SB and infer that 
A*S — SB*? The cautionary example above (A not normal and $ — A*) 
shows that the implication is surely not always true, but there are worse 
examples than that. Indeed, if 


1 0 0 0 
a=; J B= (7 9 and S=B, 


then AS — SB — 0, whereas A*S (— AS) — 0 and 


._ {0 0 
se = (6 J 


The reason this example is worse is that one of the constituents, namely A, 
is normal. Interchanging A and B and replacing S by 


0 1 

0 0 
yields an example in which B is normal. Are there bad examples like this 
even when both A and B are required to be normal? 


Problem 150. Zf A and B are normal linear transformations on a 
finite-dimensional inner product space, and if AS — SB, does it 
always follow that A*S — SB*? 


151. Normal products 


The product of two self-adjoint transformations may fail to be self-adjoint, 
but if they commute then the product must be self-adjoint too. The proof 
is a trivial piece of algebra: if A = A* and B = B*, then (AB)* = B*A* = 
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BA. Does the result extend to normal transformations? That is, if A and 
B are normal and commutative, does it follow that AB is normal? The 
answer is yes, but it is somewhat more subtle. One way to prove it is to 
use the adjoint commutativity theorem. In view of the assumption AB = 
BA, the normality of A implies that A*B = BA*. It follows that all four 
transformations A, A*, B, and B* commute with one another, and hence 
that 


(AB)(AB)* = ABB* A* = B* A' AB = (ABY' (AB). 


For self-adjoint transformations, there is a converse theorem: if the 
product of two of them turns out to be self-adjoint, then they must com- 
mute. The proof is obvious: if 


A=A*,B=B", and AB = (AB)*, 
then 
AB = B* A* = BA. 


Does the converse extend to normal transformations? That question takes 
most people a few more seconds to answer than the self-adjoint one; the 
reason is that the answer is different. No, normal transformations with 
a normal product do not have to commute. Example: take any two non- 
commutative unitary transformations their product is unitary, therefore 
normal. 

There is still another twist on questions of this type. If A and B are 
normal and commutative, so that AB is normal, then, of course, BA is 
normal too (because it is equal to AB). Is commutativity needed to draw 
that conclusion? 


Problem 151. Zf the linear transformations A and B on a finite- 
dimensional inner product space are such that A, B, and AB are 
normal, does it follow that B A is normal? 


152. Functions of transformations 


Polynomials of a linear transformation make sense, and (as Problem 144 
indicated ) sometimes they can be used to define more complicated func- 
tions of linear transformations. If, for instance, A is a normal transforma- 
tion on finite-dimensional vector space, then everything works smoothly. 
For an arbitrary function f whose domain is at least as large as spec A 
(the set of eigenvalues of A) a transformation f(A) can be defined by find- 
ing a polynomial p that agrees with f on specA and writing f(A) = p(A). 
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The process of forming functions of transformations is a powerful tool, 
and what makes it so is that the algebraic and analytic properties of such 
functions mirror faithfully the corresponding properties of numerical func- 
tions. Thus, for instance, if A is normal and f(z) = Z (complex conjugate), 
then f(A) = A* (adjoint); if, in addition, A is invertible and f(z) = + 
whenever z # 0, then f(A) = A^; and if A is positive and f(z) = yz 
whenever z 2 0, then f(A) = V/A (the unique positive square root). 

Some functions of linear transformations demand attention even in 
the absence of normality (and sometimes even in the absence of any in- 
ner product structure in the underlying vector space); conspicuous among 
them are A++ A* and, for invertible transformations, A +» A-1. Can the 
study of their behavior, good or bad, be reduced to the study of polynomi- 
als in A? 


Problem 152. (a) /f Ais a linear transformation on a finite-dimen- 
sional inner product space, does there necessarily exist a polynomial 
psuch that p( A) = A*? 

(b) Jf A is an invertible linear transformation on a finite-dimen- 
sional vector space, does there necessarily exist a polynomial p such 
that p(A) = A71? 


153. Gramians 


How easy is it to recognize that a matrix is positive? ("Positive" is used here 
in the quadratic form sense, as defined in Problem 131.) So, for example 
is either of the matrices 


( 5 =) d ( 5 11 

12 25 = 11 x) 

positive? The answer is no for the first one (note that its determinant is 
negative) and yes for the second, but no single answer like that is impor- 
tant. An effectively computable test for positiveness would be pleasant to 
have, but, failing that, even an abstract characterization would be welcome. 
Prediction is sometimes more important in mathematics (if you do so and 
so, you'll get a positive matrix) than recognition (I don’t know what you 
did, but you ended up with a positive matrix). 

The challenge to write down a hundred different matrices is a trivial 
one, even if “different” is intended to suggest radical differences, not just 
trivial ones such as are possessed by different scalar multiples of one ma- 
trix. It's just as easy to write down a hundred different Hermitian matrices. 
Is it easy to write down a hundred radically different 4 x 4 positive matri- 
ces? 
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Yes, it's easy. Consider any four vectors £1, £2, x3, z4 in C4 and form 
the matrix A = (a;;) whose entry o;; in row i and column j is the inner 
product 


(xi, £5). 


Assertion: A is positive. For the proof what must be shown is that if 
v = (v1, v2, vs, v4) is any vector in C*, then (Av,v) 2 0. The proof is a 
straightforward computation. If Au = v = (vi, vo, v3, v4), so that 


vj = 2 (i esi 
then 
(Au, u) = (v,u) = $ 7 9 (ti, 2j)u; 
j i 


= 2c (Ereman) uy = (=: 25320) 
2 
2 0. 


J Tiui 
i 


A matrix such as the A here defined is called a Gramian, or, more 
specifically, the Gramian of the vectors z1, £2, £3, £4. What was just proved 
is that every Gramian is positive. To what extent is the converse of that 
statement true? 


Problem 153. Which positive matrices are Gramians? 


154. Monotone functions 


Problem 154. Zf A and B are positive transformations on a finite- 
dimensional inner product space such that 


A& B, 
does it follow that 
A? «Bg» 
How about 
VAS VB? 
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Comment. The question could have been phrased by asking whether 
square and square root are monotone functions of transformations. 

The restriction to positive transformations is advisable, isn’t it? After 
all the function z ++ z? on the real line is not a monotone function (al- 
though it is true that —3 € 2, it is not true that 9 € 4), but its restriction 
to the positive part of the line is. In other words, for positive real numbers 
the answers are yes, which can be read as saying that for linear transforma- 
tions on a vector space of dimension 1 the answers are yes; the question is 
whether the answers remain yes for spaces of higher dimensions. 


155. Reducing ranges and kernels 


Invariant subspaces for a linear transformation A on a finite-dimensional 
vector space V are easy enough to find; the difficulty, usually, is to prove 
that they are different from the trivial subspace {0} and the improper sub- 
space V. If the vector space is equipped with an inner product, it becomes 
natural to look for reducing subspaces (subspaces invariant under both A 
and its adjoint A*); they are harder to find. Thus, for instance, both ran A 
and ker A are invariant under A, but they may well fail to be reducing. An 
easy example is furnished by the 2 x 2 matrix that is an almost universal 


counterexample. If 
0 1 
a= (a o) 


is regarded as a linear transformation on the space C?, then both ran A 
and ker A are equal to the set of all vectors of the form (z, 0), but neither 
one of them is invariant under A*. Does something more useful happen if 
intersections and spans of ranges and kernels are allowed? 


Problem 155. Zf Ais a linear transformation on a finite-dimensional 
inner product space, which of the subspaces obtained from ran A, 
ker A, ran A*, and ker A* by the formation of intersections and spans 
necessarily reduce A? 


156. Truncated shifts 


The larger the domain of a linear transformation, the more likely it is to 
have invariant and reducing subspaces. The easiest example of an irre- 
ducible linear transformation on a vector space V (that is, a transformation 
with no reducing subspaces other than {0} and V) is the one induced on 
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C? by the matrix 


(i o) 


(compare Problem 74). That example has natural generalizations to higher 
dimensions, such as 


oo oro 
ooroce 
orooo 
ep OOO c 
ooo oo 


on C? for instance. 

Another way of describing the same phenomenon is to consider any 
basis {x1,...,2n} in any vector space V of dimension n, say, and let A be 
the truncated shift that sends (shifts) x; to z;,1 (1 € j < n) and sends 
Zn to 0. The matrix of A with respect to the basis (21,...,2;,) is just like 
the one displayed above (with the role of 5 being played by n). Note that 
A?-1 4 0 but A” = 0; the transformation A is nilpotent of index exactly 
n. 

When n is large, the domain of the linear transformation A is larger 
than it is when n — 2. How effective is the enlargement as a producer of 
invariant and reducing subspaces? 


Problem 156. How many invariant subspaces does a truncated shift 
have? How many reducing subspaces does it have? 


157. Non-positive square roots 


Positive linear transformations are not the only ones for which the problem 
of square roots makes sense. Every linear transformation has a square, and 
that shows that many transformations have square roots even though they 
have nothing to do with positiveness. 


Does the matrix 
0 1 0 
A=|0 0 1 


0 0 0 


have a square root? No, it does not. If it happened that B? = A, then (since 
Ais nilpotent of index 3), it would follow that B® = 0, and that would imply 
that the minimal polynomial of B is a power of A. Since the degree cannot 
be greater than 3 (the degree of the characteristic polynomial), it follows 
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that B? = 0, whence A? = B4 = 0, which is a contradiction. This sort of 
argument can be used often, and it implies, for instance, that no truncated 
shift can have a square root. There are, however, many transformations 
about which the argument gives no information, however much they may 
resemble truncated shifts. 


Problem 157. Does the matrix 
0 10 
A=|0 00 
000 
have a square root? 


158. Similar normal transformations 


The “right” relation of equivalence between linear transformations on an 
abstract vector space is similarity; from the point of view of the structure of 
vector spaces similar transformations are indistinguishable. Inner product 
spaces have a richer structure; a relation (such as similarity) that ignores 
that structure does not give the best information. The “right” relation be- 
tween linear transformations on inner product spaces is unitary similarity 
(often, somewhat misleadingly, called unitary equivalence). 

What information does unitary similarity give that ordinary similar- 
ity does not? Since the rich structure of inner product spaces consists of 
numerical ways of measuring sizes (angles and lengths), the most natural 
answer to the question is “size”. Consider, for instance, the matrices 


1 1 1 0 
A= = , 
(( o) SEV Se f ) 


Computational verification (or a moment's thought about known elemen- 
tary sufficient conditions for similarity) will establish that A and B are sim- 
ilar. The unit vectors (1,0) and Gs =a) are eigenvectors of A; since 
their inner product is P the angle between them is 7. The unit vectors 
(1,0) and (0, 1) are eigenvectors of B; the angle between them is 7. The 
norm of A is E, (compute the larger of the square roots of the eigenval- 
ues of A* A); the norm of B is 1. Conclusion: A and B, though similar, are 
certainly not unitarily similar; every measure of size indicates a difference 
between them. 

The linear transformation defined by A is pleasantly related to the 
inner product structure: A is normal. The transformation defined by B is 
not normal. These facts establish once again that A and B couldn't possibly 
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be unitarily similar, and they suggest a search for an even more powerful 
counterexample. 


Problem 158. Do there exist two normal transformations that are 
similar but not unitarily similar? 


159. Unitary equivalence of transposes 


The hardest and most important problem about many mathematical struc- 
tures is to determine when two of them are the “same”. One such problem 
in linear algebra is to find out when two linear transformations are simi- 
lar, or, in case the underlying vector space comes endowed with an inner 
product, to find out when two linear transformations are unitarily equiv- 
alent. There exists something called elementary divisor theory, which fre- 
quently yields satisfactory answers to questions of the first of these types, 
good enough for explicit calculations. The second type of question is usu- 
ally much harder. 

Suppose, for instance, that A is a linear transformation on a finite- 
dimensional inner product space. The transformations A and A* have 
much in common, especially as far as sizes are concerned. A trivial obser- 
vation along these lines is that the geometric norms of A and A* are the 
same. Since det A* is the complex conjugate of det A, it follows that if A is 
an eigenvalue of A, then À (the complex conjugate of A) is an eigenvalue 
of A*, and hence, in particular, that || = [A]; the sizes of the eigenvalues 
of the two transformations are the same. 

Could it be that A and A* are always unitarily equivalent? No, that's 
absurd: complex conjugation is in the way. Unitarily equivalent transfor- 
mations have the same eigenvalues, but it can perfectly well happen that a 
linear transformation A has A for an eigenvalue but not A; in that case A 
and A* cannot possibly be unitarily equivalent. 

The adjoint of a matrix is the complex conjugate of its transpose. If the 
conjugation step is omitted, does the difficulty disappear? 


Problem 159. Zs every matrix unitarily equivalent to its transpose? 


To interpret the question, identify each n x n matrix with the linear 
transformation that it induces on the inner product space C^. Note that, 
by the elementary divisor theory referred to above, if unitary equivalence is 
replaced by similarity, then the answer is yes: a matrix A and its transpose 
A’ obviously have the same elementary divisors. 
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160. Unitary and orthogonal equivalence 


If two real matrices are complex similar, then they are real similar—a con- 
struction to prove that was carried out in Problem 89. What happens (spe- 
cial case) when two real matrices are unitarily equivalent? 

Is it possible for two real matrices to be unitarily equivalent in a com- 
plex way? That is: do there exist real matrices A and B and a complex 
unitary matrix U such that U* AU = B? The question is not sharp enough; 
it admits trivial answers such as A = B = Qand U = 1. An answer like that 
would be described as trivial by everyone, but just exactly what deserves to 
be called non-trivial? 

It’s not enough to insist that U be genuinely complex; trivial answers 
still exist. Example: let U be an arbitrary “genuinely complex” unitary ma- 
trix and take A = B = 1. 

Here is an example of a different kind: take 


0 0 10 0 i 
(1) 270) f 9) 


and verify that S* AS = (S-!AS) = B. The technique of Problem 89 
shows that if P and Q are the real and imaginary parts of S, 


S=P+iQ, 


then there exist real numbers A such that T = P + AQ is invertible, and 
for any such T it is true that T! AT = B. The matrices T are of the form 
0 A 
1 0 
pleasant phenomenon occurs only rarely. 
Still another example: take an arbitrary real A and an arbitrary real 
V that is unitary, define B = V* AV, and write U = oV, where a is an 
arbitrary complex number of modulus 1. This latter example leads to oth- 
ers that cannot be spotted so easily. Form the direct sum of two of them, 
and then transform every transformation that enters by a sufficiently com- 
plicated looking real unitary transformation W. (To transform means to 
replace X by W* XW.) The new U (a complicated real transform of the 
direct sum of a couple of complex scalar multiples of two other real U’s) 
looks as genuinely complex as anything can, but it succeeds in transforming 
the new real A to the new real B. 
Are all these examples "artificial"? The following problem is a precise 
way of putting this somewhat vague question. 


) , and while it’s quite possible for such a matrix to be unitary, that 


Problem 160. 7f two real matrices are (complex) unitarily equiva- 
lent, does it follow that they are also (real) orthogonally equivalent? 
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161. Null convergent powers 


If a finite-dimensional (real or complex) vector space V does not come 
equipped with an inner product, then there is no natural notion of distance 
for vectors in it or linear transformations on it, but there are many equally 
good “unnatural” ones. To get one of them, choose a basis, and, using it, 
establish an isomorphism between V and C” for the appropriate n. The 
isomorphism transplants the natural inner product structure (and hence 
the analytic and topological structures, such as distance and convergence) 
from C” to V. It is good to know (and not especially difficult to prove) that 
while the distances obtained this way may be very different from one an- 
other, the topologies are all the same. If, in particular, ( A4) is a sequence 
of linear transformations on a finite-dimensional (real or complex) vector 
space, then it might make sense to ask whether the sequence converges to 
something. It might make sense, but it is usually not worth the trouble to 
ask such questions; it is simpler and more honest to restrict attention to 
matrices in the first place. 

As an illustration of the kind of analytic question that it is often useful 
to ask about matrices, consider this one: if a complex matrix A is such that 
A" — Qas n — oo, what can be said about the eigenvalues of A? Answer: 
every one of them must be strictly less than 1 in absolute value. Reason: 
if Ax = Az, then A”x = A"z, and therefore, provided only that x # 0, it 
follows that A" — 0. Is the converse true? 


Problem 161. Jf every eigenvalue of a complex matrix A is strictly 
less than 1 in absolute value, does it follow that A” — 0 as n — oo? 


162. Power boundedness 


The powers of a linear transformation A can exhibit several kinds of good 
and bad behavior. Solution 161 discussed the possibilities || A” || — 0 (good) 
and ||A"|| — oo (bad). A good possibility between those two extremes 
is that ||A”||, as a function of n, is bounded. The possibility is important 
enough that it deserves to be given its name even before it is adequately 
studied and characterized; a linear transformation with that property is 
called power bounded. Which ones are? 

If || A|| < 1 (that can be expressed by the usual technical term: A is a 
contraction), then surely ||A"|| € 1 for all n. Contractions must be power 
bounded; can anything else be? Is it possible to have a transformation A 
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with || A|| = 2 and ||A”|| bounded? Yes, it is. Example: 


A= i " 
0 0 
The point is, of course, that A" = 0 whenever n 2 2; the presence of the 
entry 2 doesn't affect any of the powers of A after the first. 
Examples like the last might tend to shake one's faith in the possibility 
of a good connection between power boundedness and contractions; the 


next comment might restore some of that faith. If a linear transformation 
Ais not a contraction but is similar to one, 


A=S"'CS, |Cl| S 1, 
then 


||A”|| = I5"! C^S|| € [S || - Cl ISI S I5" I SII 
so that A is power bounded. How likely is that faith to be shaken? 


Problem 162. Zs every power bounded linear transformation on a 
finite-dimensional inner product space necessarily similar to a con- 
traction? 


163. Reduction and index 2 


The easy examples of irreducible transformations turn out to be nilpotent 
(see Solution 156); it almost looks as if every nilpotent transformation must 
be irreducible. That's not true, of course: for a counterexample just form 
the direct sum of two nilpotent transformations to get one that is still nilpo- 
tent but definitely not irreducible. Contemplation of such examples sug- 
gests a question: are there relations between the index of nilpotence and 
the dimension of the space that either force or prevent irreducibility? The 
answer is not obvious even for the lowest possible index. 


Problem 163. Zs there an irreducible nilpotent transformation of 
index 2 on a space of dimension greater than 2? 


164. Nilpotence and reduction 


If a linear transformation A on a vector space V of dimension n is nilpotent 
of index k, what can be said about the existence of reducing subspaces for 
A? If k « n, then Acan be reducible (trivial, form direct sums); if k — n, 
then A cannot be reducible (that is in effect what Solution 151 shows); and 
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if k = 2 and n > 2, then A must be reducible (see Solution 163). What 
can be guessed from that much evidence? A possible guess is that if k « n, 
then A is necessarily reducible. Is that true? 


Problem 164. Can a nilpotent linear transformation A of index 3 
on an inner product space V of dimension 4 be irreducible? 


HINTS 


Chapter 1. Scalars 


Hint 1. Write down the associative law for |+|. 
Hint 2. Same as for Problem 1: substitute and look. 
Hint3. How could it be? 

Hint 4. Note the title of the problem. 


Hint 5. The affine transformation of the line associated with the real 
numbers a and £ is the one that maps each real number £ onto o£ + B. 


Hint 6. Does it help to think about 2 x 2 matrices? If not, just compute. 


Hint 7. Let rg (for “reduce modulo 6”) be the function that assigns to 
each non-negative integer the number that's left after all multiples of 6 are 
thrown out of it. Examples: rg(8) = 2, rg(183) = 3, and rg(6) = 0. Verify 
that the result of multiplying two numbers and then reducing modulo 6 
yields the same answer as reducing them first and then multiplying the 
results modulo 6. Example: the ordinary product of 10 and 11 is 110, which 
reduces modulo 6 to 2; the reduced versions of 10 and 11 are 4 and 5, whose 
product modulo 6 is 20 — 18 — 2. 
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Hint 8. The answer may or may not be easy to guess, but once it’s cor- 
rectly guessed it’s easy to prove. The answer is yes. 


Hint 9. Not as a consequence but as a coincidence the answer is that the 
associative ones do and the others don’t. 


Hint 10. To find —1_, 
Q 1 


a — ifj. The old-fashioned name for the procedure is “rationalize the de- 
nominator". 


multiply both numerator and denominator by 


Hint 11. The unit is (1, 0). Caution: non-commutativity. 


Hint 12. The unit is P 3 


Hint13. (a)and (b). Are the operations commutative? Are they associa- 
tive? Do the answers change if R+ is replaced by [0, 1]? 


(c) Add (—z) to both sides of the assumed equation. 


Hint 14. An affine transformation £ — o£ + 8 with a = 0 has no inverse; 


a matrix with 
a B 
y 6 
að — By = 0 has no inverse. 


The integers modulo 3 form an additive group, and so do the integers 
modulo anything else. Multiplication is subtler. Note: the number 6 is not 
a prime, but 7 is. 


Hint 15. If the underlying set has only two elements, then the answer is 
no. 


Hint 16. Use both distributive laws. 

Hint 17. In the proofs of the equations the distributive law must enter 
directly or indirectly; if not there's something wrong. The non-equations 
are different: one of them is true because that's how language is used, and 


the other is not always true. 


Hint18. Think about the integers modulo 5. 
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Hint 19. The answer is yes, but the proof is not obvious. One way to do it 
is by brute force; experiment with various possible ways of defining + and 
x, and don't stop till the result is a field. 

A more intelligent and more illuminating way is to think about polyno- 
mials instead of integers. That is: study the set P of all polynomials with co- 
efficients in a field of two elements, and “reduce” that set “modulo” some 
particular polynomial, the same way as the set Z of integers is reduced 
modulo a prime number » to yield the field Z,. If the coefficient field is 
taken to be Q and the modulus is taken to be z? — 2, the result of the pro- 
cess is (except for notation) the field Q(/2). If the coefficient field is taken 
to be Z2 and the modulus is taken to be an appropriately chosen polyno- 
mial of degree 2, the result is a field with four elements. Similar techniques 
work for 8, 16, 32, ... and 9, 27, 81, ... , etc. 


Chapter 2. Vectors 


Hint 20. The 0 element of any additive group is characterized by the fact 
that 0 + 0 = 0. How can it happen that oz = 0? Related question worth 
asking: how can it happen that ax = x? 


Hint21. (1) The scalar distributive law; (2) the scalar identity law; (3) the 
vector distributive law; (4) none; (5) the associative law; (6) none. 


Hint 22. Can you solve two equations in two unknowns? 
Hint23. (a): (1), (2), and (4); (b): (2) and (4). 


Hint 24. (a) Always. (b) In trivial cases only. Draw pictures. Don't forget 
finite fields. If it were known that a vector space over an infinite field cannot 
be the union of any two of its proper subspaces, would it follow that it 
cannot be the union of any finite number? In any event: whenever M; and 
Mb are subspaces, try to find a vector z in M; but not in Mo and a vector 
y not in Mi, and consider the line through y parallel to z. 


Hint25. (a) Can it be done so that no vector in either set is a scalar multi- 
ple of a vector in the other set? (b) Can you solve three equations in three 
unknowns? 


Hint 26. Is it true that if z is a linear combination of y and something in 
M, then y is a linear combination of z and something in M? 
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Hint 27. (a) No; that’s easy. (b) Yes; that’s very easy. (c) No; and that 
takes some doing, or else previous acquaintance with the subject. (d) Yes; 
and all it requires is the definition, and minimum acquaintance with the 
concept of polynomial. 

The reader should be aware that the problem was phrased in incorrect 
but commonly accepted mathematese. Since “span” is a way to associate a 
subspace of V with each subset of V, the correct phrasing of (a) is: “is there 
a singleton that spans R2?" Vectors alone, or even together with others (as 
in (b) and (c)), don’t span subspaces; spanning is done by sets of vectors. 
The colloquialism does no harm so long as its precise meaning is not for- 
gotten. 


Hint 28. Note that since LN (LM N) = LNN, the equation is a special 
case of the distributive law. The answer to the question is yes. The harder 
half to prove is that the left side is included in the right. Essential step: 
subtract. 

Hint 29. Look at pictures in R?. 

Hint 30. No. 


Hint 31. Just look for the correct term to transpose from one side of the 
given equation to the other. 


Hint 32. Use Problem 31. 


Chapter 3. Bases 

Hint 33. Examine the case in which E consists of a single vector. 

Hint 34. Itis an elementary fact that if M is an m-dimensional subspace 
of an n-dimensional vector space V, then every complement of M has di- 
mension n — m. It follows that if several subspaces of V have a simulta- 
neous complement, then they all have the same dimension. Problem 24 is 
relevant. 


Hint35. (a) Irrational? (b) Zero? 


Hint36. 42? 
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Hint37. (a)a—{?(b) No room. Keep in mind that the natural coefficient 
field for C? is C. 


Hint 38. Why not? One way to answer (a) is to consider two independent 
vectors each of which is independent of (1, 1). One way to answer (b) is to 
adjoin (0, 0, 1, 0) and (0, 0, 1, 1) to the first two vectors and (1, 0,0, 0) and 
(1, 1,0,0) to the second two. 


Hint39. (a) Too much room. (b) Are there any? 


Hint 40. How many vectors can there be in a maximal linearly indepen- 
dent set? 


Hint 41. What information about V '*?! does a basis of V give? 
Hint 42. Cana basis for a proper subspace span the whole space? 


Hint 43. Use Problems 32 and 33. Don't forget to worry about indepen- 
dence as well as totality. 


Hint 44. Given a subspace, look for an independent set in it that is as 
large as possible. 


Hint 45. Note that finite-dimensionality was not explicitly assumed. Re- 
call that a possibly infinite set is called dependent exactly when it has a finite 
subset that is dependent. Contrariwise, a set is independent if every finite 
subset of it is independent. As for the answer, all it needs is the definitions 
of the two concepts that enter. 


Hint46. Itistempting to apply a downward induction argument, possibly 
infinite. People who know about Zorn's lemma might be tempted to use 
it, but the temptation is not likely to lead to a good result. A better way to 
settle the question is to use Problem 45. 


Hint 47. Omit one vector, express it as a linear combination of remaining 
vectors, and then omit a new vector different from all the ones used so far. 


Hint 48. A few seconds of geometric contemplation will reveal a rela- 
tively independent subset of R? consisting of 5 vectors (which is n 4- 2 in 
this case). If, however, F is the field Zz of integers modulo 2, then a few 
seconds of computation will show that no relatively independent subset of 
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F3 can contain more than 4 vectors. Why is that? What is the big difference 
between FF and R that is at work here? 
Having thought about that question, proceed to use induction. 


Hint49. A slightly modified question seems to be easier to approach: how 
many ordered bases are there? For the answer, consider one after another 
questions such as these: how many ways are there of picking the first vector 
of a basis?; once the first vector has been picked, how many ways are there 
of picking the second vector?; etc. 


Hint 50. The answer is n 4- m. 

Hint 51. There is a sense in which the required constructions are trivial: 
no matter what V is, let M be O and let N be V. In that case V/M is the same 
as V and V/N is a vector space with only one element, so that, except for 
notation, it is the same as the vector space O. If V was infinite-dimensional 
to begin with, then this construction provides trivial affirmative answers to 
both parts of the problem. Many non-trivial examples exist; to find one, 
consider the vector space IP of polynomials (over, say, the field R of real 
numbers). 


Hint 52. The answer is n — m. Start with a basis of M, extend it to a basis 
of V, and use the result to construct a basis of V/M. 


Hint 53. If M and N are finite subsets of a set, what relation, if any, 
is always true among the numbers card M, card N, card(M U N), and 
card(M n N)? 


Chapter 4. Transformations 


Hint 54. Squaring scalars is harmless; trying to square vectors or their 
parts is what interferes with linearity. 


Hint 55. (1) Every linear functional except one has the same range. 

(2) Compare this change of variables with the one in Problem 54 
(1 (b). 

(3) How many vectors does the range contain? 

(4) Compare this transformation with the squaring in Problem 54 
C (b. 


(5) How does this weird vector space differ from R!? 
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Hint 56. (1) What do you know about a function if you know that its 
indefinite integral is identically 0? 

(2) What do you know about a function if you know that its derivative 
is identically 0? 

(3) Solve two “homogeneous” equations in two unknowns. ("Homo- 
geneous” means that the right sides are 0.) 

(4) When is a polynomial 0? 

(5) What happens to the coordinate axes? 

(6) This is an old friend. 


Hint 57. (1) What could possibly go wrong? 

(2) Neither transformation goes from R? to R?. 

(3) What happens when both S and T are applied to the constant poly- 
nomial 1? What about the polynomial z? 

(4) Do both products make sense? 

(5) What happens when both S and T are applied to the constant poly- 
nomial 1? What about the polynomial z? What about z?? 

(6) There is nothing to do but honest labor. 


Hint 58. Consider complements: for left divisibility, consider a comple- 
ment of ker B, and for right divisibility consider a complement of ran B. 


Hint 59. (1) If the result of applying a linear transformation to each vec- 
tor in a total set is known, then the entire linear transformation is known. 
(2) How many powers does A have? 
(3) What is Ax? 
Hint 60. Make heavy use of the linearity of T. 
Hint 61. (1) What is the kernel? (2) What is T?? (3) What is the range? 


Hint 62. Choose the entries (y) and (6) closely related to the entries (a) 
and (6). 


Hint 63. Direct sum, equal rows, and similarity. 


Hint64. The “conjecturable” answer is too modest; many of the 0's below 
the diagonal can be replaced by 1’s without losing invertibility. 


Hint 65. Start with a basis of non-invertible elements and make them 
invertible. 
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Hint 66. To say that a linear transformation sends some independent set 
onto a dependent set is in effect the same as saying that it sends some non- 
zero vector onto 0. 


Hint 67. If the dimension is 2, then there is only one non-trivial permu- 
tation to consider. 


Hint 68. There is nothing to do but use the general formula for matrix 
multiplication. It might help to try the 2 x 2 case first. 


Hint 69. Look at diagonal matrices. 
Hint 70. Yes. 

Hint 71. Consider differentiation. 

Hint 72. If E is a projection, what is E?? 
Hint 73. Multiply them. 


Hint 74. No and no. Don't forget to ask and answer some other natural 
questions in this neighborhood. 


Chapter 5. Duality 


Hint 75. If there were such a scalar, would it be uniquely determined by 
the prescribed linear functionals € and 7? 


Hint 76. Use a basis of V to construct a basis of Y’. 


Hint 77. This is very easy; just ask what information the hypothesis gives 
about the kernel of T. 


Hint 78. Does it help to assume that V is finite-dimensional? 


Hint79. IfVisR° and Mis the set of all vectors of the form (£1, £, £5, 0, 0) 
what is the annihilator of M? 


Hint 80. What are their dimensions? 
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Hint 81. Surely there is only one sanely guessable answer. 
Hint 82. Can kernels and ranges be used? 


Hint83. There is no help for it: compute with subscripts. 


Chapter 6. Similarity 

Hint 84. How does one go from z to y? 

Hint 85. How does one go from 7 to €? 

Hint 86. Transform from the z's to the y's, as before. 


Hint 87. Use, once again, the transformation that takes one basis to the 
other, but this time in matrix form. 


Hint 88. If one of B and C is invertible, the answer is yes. 

Hint 89. Think about real and imaginary parts. That can solve the prob- 
lem, but if elementary divisor theory is an accessible tool, think about it: 
the insight will be both less computational and more deep. 

Hint 90. Extend a basis of ker A to a basis of the whole space. 

Hint 91. Yes. 


Hint 92. Look first at the case in which 8y # 0. 


Hint 93. The first question should be about the relation between ranges 
and sums. 


Hint 94. The easy relation is between the rank of a product and the rank 
of its first factor; how can information about that be used to get informa- 
tion about the second factor? 


Hint 95. The best relation involves null A 4- null B. 


Hint96. For numerical calculations the geometric definition of similarity 
is easier to use than the algebraic one. 
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Hint 97. There are two pairs of bases, and, consequently, it is reasonable 
to expect that two transformations will appear, one for each pair. 


Hint 98. Even though the focus is on the dimensions of ranges, it might 
be wise to begin by looking at the dimensions of kernels. 


Chapter 7. Canonical Forms 


Hint 99. If A is a linear transformation, is there a connection between 
the eigenvalues of A and A?? 


Hint100. Theansweris no. Can you tell by looking at a polynomial equa- 
tion what the sum and the product of the roots has to be? 


Hint 101. This is not easy. Reduce the problem to the consideration of 
A = 1, and then ask whether the classical infinite series formula for Du 


suggests anything. 

Hint 102. What about monomials? 

Hint 103. 3-1. 

Hint 104. If, is an eigenvalue of A, consider the polynomial p(A) — u. 
Hint 105. What are the eigenvalues? 

Hint 106. What does the assumption imply about eigenvectors? 

Hint 107. No. 


Hint 108. Look for the triangular forms that are nearest to diagonal ones 
—that is the ones for which as many as possible of the entries above the 
diagonal are equal to 0. 


Hint 109. Think about complex numbers. 


E 


Hint 110. What can the blocks in a triangularization of A look like? 
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Hint 111. The answer depends on the dimension of the space and on the 
index of nilpotence; which plays the bigger role? 


Hint 112. The answer depends on size; look at matrices of size 2 and 
matrices of size 3. 


Hint 113. Examine what M does to a general vector (a, 8, y, 6,€) and 
then force the issue. 


Hint 114. Don’t get discouraged by minor setbacks. A possible approach 
is to focus on the case A = 1, and use the power series expansion of y1 + ¢. 


Hint 115. Use eigenvalues—they are more interesting. Matrices, how- 
ever, are quicker here. 


Hint 116. What turns out to be relevant is the Chinese Remainder The- 
orem. The version of that theorem in elementary number theory says that 
if 2,...,2, are integers, pairwise relatively prime, and if y;,..., y; are 
arbitrary integers, then there exists an integer z such that 


Tj = Yj mod z 


for j = 1,...,n. A more sophisticated algebraic version of the theorem 
has to do with sets of pairwise relatively prime ideals in arbitrary rings, 
which might not be commutative. The issue at hand is a special case of 
that algebraic theorem, but it can be proved directly. The ideals that enter 
are the annihilators (in the ring of all complex polynomials) of the given 
linear transformations. 


Chapter 8. Inner Product Spaces 


Hint 117. Form the inner product of a linear dependence relation with 
any one of its terms. 


Hint 118. Is there an expression for (x, y) in terms of z and y and norms 
—one that involves no inner products such as (u, v) with u # v? 


Hint 119. Examine both real and complex vector spaces. 


Hint 120. Always. 
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Hint 121. Keep enlarging. 
Hint 122. Evaluate the norms of linear combinations of x and y. 


Hint 123. How close does an arbitrary vector in V come to a linear com- 
bination of an orthonormal basis for M? 


Hint 124. Look at ker? £. 

Hint 125. (a) By definition (U*v, w) = (v, Uw); there is no help for it but 
to compute with that. (b) Yes. (c) Look at the image under U of the graph 
of A. Y 


Hint 126. Some of the answers are yes and some are no, but there is only 
one (namely (d)) that might cause some head scratching. 


Hint 127. Is something like polarization relevant? 
Hint 128. Always? 


Hint 129. Only (c) requires more than a brief moment’s thought; there 
are several cases to look at. 


Hint 130. Problem 127 is relevant. 

Hint 131. The easy ones are (a) and (b); the slightly less easy but straight- 
forward ones are (c) and (d). The only one that requires a little thought is 
(e); don't forget that a must be real for the question to make sense. 

Hint 132. The answer is short, but a trick is needed. 

Hint 133. What is the adjoint of a perpendicular projection? 


Hint 134. A little computation never hurts. 


Hint 135. IfE < Fandzisinran E, evaluate |y —£z||. Ifran E C ran F, 
then EF — E. 


Hint 136. Is the product of two perpendicular projections always a per- 
pendicular projection? 
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Hint 137. Quadratic forms are relevant. 


Hint 138. If Az = Aii and Axe = A222; examine (21, 22). 


Chapter 9. Normality 


Hint 139. Is the dimension of the underlying vector space finite or infi- 
nite? Is U necessarily either injective or surjective? 


Hint 140. Can “unitary” be said in matrix language? 


Hint141. The question is whether any of the three conditions implies any 
of the others, and whether any two imply the third. 


Hint 142. Must they be diagonal? 


Hint 143. Look at the eigenspaces corresponding to the distinct eigen- 
values. 


Hint 144. Diagonalize. 


Hint 145. Assume the answer and think backward. The invertible case is 
easier. 


Hint 146. Imitate the Hermitian proof. 

Hint 147. Imitate the Hermitian proof. 

Hint 148. It's a good idea to use the spectral theorem. 

Hint 149. Use Solution 148. 

Hint 150. Assuming that AS = SB, with both A and B normal, use the 
linear transformations A, B, and S, as entries in 2 x 2 matrices, so as to be 
able to apply the adjoint commutativity theorem. 


Hint 151. Put C = B(A* A) — (A*A)B and study the trace of C*C. 


Hint 152. (a) Consider triangular matrices. (b) Consider the Hamilton- 
Cayley equation. 
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Hint 153. Use square roots. 
Hint 154. The two answers are different. 


Hint 155. Some do and some don't; the emphasis is on necessarily. For 
some of the ones that don't, counterexamples of size 2 are not large enough. 


Hint 156. How many eigenvectors are there? More generally, how many 
invariant subspaces of dimension k are there, 0 € k € n? 


Hint 157. What's the relation between A and the matrix 


0 0 1 S 
00 0]? 


000 


Hint 158. Consider the polar decomposition of a transformation that af- 
fects the similarity; a natural candidate for a unitary transformation that 
affects the equivalence is its unitary factor. Don't be surprised if the argu- 
ment wants to lean on the facts about adjoint intertwining (Solution 150). 


Hint 159. Most people find it difficult to make the right guess about this 
question when they first encounter it. The answer turns out to be no, but 
even knowing that does not make it easy to find a counterexample, and, 
having found one, to prove that it works. One counterexample is a 3 x 3 
nilpotent matrix, and one way to prove that it works is to compute. 


Hint 160. Solution 89 describes a way of passing from complex similarity 
to real similarity, and Solution 158 shows how to go from (real or complex) 
similarity to (real or complex) unitary equivalence. The trouble is that So- 
lution 158 needs the adjoint intertwining theorem (Solution 150), which 
must assume that the given transformations are normal. Is the assumed 
unitary equivalence sufficiently stronger than similarity to imply at least a 
special case of the intertwining theorem that can be used here? 


Hint 161. Look at the Jordan form of A. 
Hint 162. Is a modification of the argument of Solution 161 usable? 


Hint 163. If A is nilpotent of index 2, examine subspaces of the form 
N + AN, where N is a subspace of ker A*. 
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Hint 164, The most obvious nilpotent transformation on C^ is the trun- 
cated shift (see Problem 156), but that has index 4. It’s tempting to look 
at its square, but that has index 2. What along these lines can be done to 
produce nilpotence of index 3? 


SOLUTIONS 


Chapter 1. Scalars 


Solution 1. 
The associative law for expressed in terms of + looks like this: 


2(2a + 28) + 2y = 2a + 2(28 + 2), 
which comes to 
4a + AB + 2*4 = 2a + AB + 47. (*) 
That can be true, but it doesn’t have to be; it is true if and only ifa = +. 


If, for instance, a = 8 = 0 and y = 1, then the desired equation becomes 
the falsehood 


0+04+2=040+4 (wx) 
Conclusion: the associative law for is false. 
Comment. Does everyone agree that an alphabetical counterexample 


(such as (*)) is neither psychologically nor logically as convincing as a num- 
erical one (such as (**))? 
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Solution 2. 
The associative law for is false. The equation 

e [rubea = EH ly 
says that 

2a + (28 + y) = 2(2a + 8) 4- 7, 


which is true if and only if a = 0. If, for instance, a = 1 and 8 = y = 0, 
then the desired equation becomes the falsehood 


2 + (0 +0) = 2(2+0)+0. 


Solution 3. 


For both commutativity and associativity it is harder to find instances where 
they hold than instances where they don't. Thus, for instance, 


(a) = a") 


is true only if a = 1, or y = 1, or 8 = y = 2. If, in particular, a = y = 2 
and 8 = 1, then it is false. Exponentiation is neither commutative nor 
associative. 


Solution 4. 


Both answers are yes, and one way to prove them is to compute. Since 


(7, 6) C (a, 8) = (ya — 68,8 + ôa), 


the commutativity of [-] is a consequence of the commutativity of the or- 
dinary multiplication of real numbers. 
The computation for associativity needs more symbols: 


((a, 8) E16, 6)) E1 (6. ¢) 
= ((ay — B6)e — (a + 8v, (ay — 86) + (ad + B)e) 
and 


(a, 8) E (y, 6) E (66, ¢)) 
= (a(ye — p) — Blye + 6), alyp + 66) + B(ye — 6)). 
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By virtue of the associativity of the ordinary multiplication of real numbers 
the same eight triple products, with the same signs, occur in the right-hand 
sides of both these equations. 

For people who know about complex numbers and know that for them 
both addition and multiplication are both commutative and associative, 
Problem 4 takes just as little work as the paragraph that introduced it. In- 
deed: if (a, 8) is thought of as a + 8i then and [-] become “ordi- 
nary” complex addition and multiplication, and after that insight nothing 
remains to be done. 


Solution 5. 


Straightforward computation shows that the equation 


(a, 8) E] (7, 6) = (7, 6) E (a, 8) 


is a severe condition that is quite unlikely to be satisfied. An explicit coun- 
terexample is given by 


(1, 1) C] (2,1) = (2,2) 


and 


(2,1) 0 (1, = (2,3). 


The associativity story is quite different; there straightforward com- 
putation shows that it is always true. This way of multiplying pairs of real 
numbers is not a weird invention; it arises in a natural classical context. An 
affine transformation of the real line is a mapping S defined for each real 
number £ by an equation of the form S(£) = o£ + 8, where a and 8 them- 
selves are fixed preassigned real numbers. If T is another such mapping, 
T(E) = yé + 6, then the composition ST (for the purist: S o T) is given by 


(ST)(£) = S(y6 + 6) = a(y6 + 6) + B = (oy)& + (aô + B). 


In other words, the product ST of the transformations corresponding 
to 


(0,8) and (7,6) 


is exactly the transformation corresponding to (a, 3) C] (y, 6). Since the 
operation of composing transformations is always associative, the associa- 
tivity of [] can be inferred with no further computation. 

Isthat all right? Is the associativity of functional composition accepted? 
If it is not accepted, it can be proved as follows. Suppose that R, S, and T 
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are mappings of a set into itself, and write P = RS, Q = ST. Then, for 
each z in the domain, 


((RS)T)(z) = (PT)(z) = P(T(z)) [by the definition of PT] 
= (RS)(T(z)) = R(S(T(@))) [by the definition of RS]. 
whereas 
(R(ST))(z) = (RQ)(z) = R(Q(z)) [by the definition of RQ] 
= R((ST)(z)) = R(S(T(2))) [by the definition of ST]. 


Since the last terms of these two chains of equations are equal, the first 
ones must be also. 


Solution 6. 


In view of the comment about Problem 5 being a special case, it follows 
immediately that the present [-] is not commutative. To get a counterex- 
ample, take any two pairs that do not commute for the [-] of Problem 
5 and use each of them as the beginning of a quadruple whose last two 
coordinates are 0 and 1. Concretely: 


(1, 1, 0, 1) x (2, I, 0, 1) = (2, 2,0, 1) 
and 
(2,1, 0,1) - (1,1, 0,1) = (2,3, 0, 1). 


Associativity is harder. It was true for Problem 5 and it might con- 
ceivably have become false when the domain was enlarged for Problem 6. 
There is no help for it but to compute; the result is that the associative law 
is true here. 

For those who know about the associativity of multiplication for 
2 x 2 matrices no computation is necessary; just note that if a quadruple 


(o, B, y, 6) is written as 
a B 
y 8? 


then the present product coincides with the ordinary matrix product. 


Solution 7. 


The worst way to solve the problem is to say that there are only 36 (six times 
six) possible ordered pairs and only 216 (six times six times six) possible 
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ordered triples that can be formed with 0, 1, 2, 3, 4, 5—in principle the 
commutativity and associativity questions can be decided by examining all 
of them. 

A better way, for commutativity for instance, is to note that if each of 
a and B is one of the numbers 0, 1, 2, 3, 4, 5, and if the largest multiple of 6 
that doesn’t exceed their ordinary product is, say, 60, so that o8 = y + 60, 
where y is one of the numbers 0, 1, 2, 3, 4, 5, then, because ordinary multi- 
plication is commutative, the same conclusion holds for Ga. Consequence: 


oa[]8-v 


and 


B[]o — v. 
The reasoning to prove associativity works similarly—the language and the 
notation have to be chosen with care but there are no traps and no diffi- 
culties. 
The intellectually most rewarding way is to use the hint. If m and n 
are non-negative integers, then each of them is one of the numbers 0, 1, 
2, 3, 4, 5 plus a multiple of 6 (possibly the zero multiple). Establish some 
notation: say r(m) = o plus a multiple of 6, and r(n) = 8 plus a multiple 
of 6. (The reason for “r” is to be reminded of “reduce”.) Consequence: 
when mn and aĝ are reduced modulo 6 they yield the same result. (Think 
about this step for a minute.) Conclusion: 


r(mn) = r(m) [:]r(n), 
as the hint promised. 

This was work, but it uses a standard technique in algebra (it's called 
homomorphism and it will be studied systematically later), and it pays off. 
Suppose, for instance, that each of a, 3, and y is one of 0, 1, 2, 3, 4, 5, so 
that r(a) = a and r(8) = B, r(y) = y. The proof of the associative law 
can be arranged as follows: 


(0018) Fy = (ra) 17(9)) Ar) 
= r(aB)[:]r(y) [by the preceding paragraph] 
—r((a5)y) [ditto] 
— r(o(8y)) [because ordinary multiplication is associative] 
= r(a) E]r(89) = r(o) [3 (r(8) ry) 
= a] (eR) 


—the last three equalities just unwind what the first three wound up. 
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An important difference between the modular arithmetic of 6 and 7 
will become visible later, but for most of the theory they act the same way, 
and that is true, in particular, as far as commutativity and associativity are 
concerned. 


Solution 8. 


The answer may or may not be easy to guess, but once it’s correctly guessed 
it’s easy to prove. The answer is yes, and anyone who believes that and sets 
out to construct an example is bound to succeed. 

Call the three elements for which multiplication is to be defined a, 8, 
and ~y; the problem is to construct a multiplication table that is commuta- 
tive but not associative. 

Question: what does commutativity say about the table? Answer: sym- 
metry about the principal diagonal (top left to bottom right). That is: if the 
entry in row a and column £ is, say, y, then the entry in row 8 and column 
o must also be y. 

How can associativity be avoided? How, for instance, can it be guar- 
anteed that 


(axB)x*£Zax(8xqd)? 


Possible approach: make o x 8 = y and 8 x y = o; then the associative 
law will surely fail if y x y and o x o are different. That's easy enough to 
achieve and the following table is one way to do it: 


Here, for what it's worth, is a verbal description of this multiplication: 
the product of two distinct factors is the third element of the set, and the 
product of any element with itself is that element again. 

This is not the only possible solution of the problem, but it's one that 
has an amusing relation to the double addition in Problem 1. Indeed, if 
the notation is changed so as to replace a by 0, 8 by 2, and y by 1, then the 
present x satisfies the equation 


ax B — 2a -- 20, 


where the plus sign on the right-hand side denotes addition modulo 3. 
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Solution 9. 


(1) How could a real number e be an identity element for double addition? 
That is, can it be that 


2a --26— a 


for all o? Clearly not: the equation holds only when a = —2e, so that, in 
particular, it does not hold when a = 1 and e = 0. 

(2) The answer is slightly different for half double addition. It is still 
true that for no £ does 


2a+eE=a 


hold for all a, but since this operation is not commutative at least a glance 
at the other order is called for. Could it be, for some e, that 


2+a=a 


for all a? Sure: just put e = 0. That is: half double addition has no right 
identity element but it does have a left identity. 

(3) Exponentiation behaves similarly but backward. There is a right 
identity, namely 1 (o! = a for all o), but there is no left identity (£ = a 
for all a is impossible no matter what € is). 

(4) The ordered pair (1, 0) (or, if preferred, the complex number 1 + 
0 - i) is an identity for complex multiplication (both left and right, since 
multiplication is commutative). 

(5) The ordered pair (1, 0) does the job again, but this time, since mul- 
tiplication is not commutative, the pertinent equations have to be checked 
both ways: 

(o, B) x (1,0) = (a, 8) 
and 

(1,0) x (a, B) = (a, B). 
Equivalently: the identity mapping Z, defined by J(a) = a, is an affine 
transformation that is both a right and a left unit for functional composi- 
tion. That is: if S is an affine transformation, then 


loS-—SoI-8S. 


(6) The quadruple (1, 0,0, 1) is a unit for matrix multiplication (both 
left and right), or, if preferred, the identity matrix 


(s 1) 


is an identity element. 
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Since complex multiplication and and affine multiplication are known 
to be special cases of matrix multiplication (see Problem 6), it should come 
as no surprise to learn that the identity elements described in (4) and (5) 
above are special cases of the one described in (6). 

(7) Modular addition and multiplication cause the least trouble: 0 does 
the job for +, and 1 does it for x. 


Solution 10. 


Given o and £, can one find y and 6 so that the product of (o, 8) and 
(5,6) is (1, 0)? The problem reduces to the solution of two equations in 
the unknowns y and 6: 


ay — 6 — 1, 
ad + By = 0. 


The standard elementary techniques for doing that yield an answer in every 
case, provided only that 


o? +8? #0, 


or in other words (since a and 8 are real numbers) provided only that not 
both o and @ are 0. 
Alternatively: since in the customary complex notation 


EC NR Oe oo A a 
a-0i (o-Bi)(o—fi o?-82 a2+ 6?’ 


it follows that (a, 9) is invertible if and only if o? + 6? Æ 0, and, if that 
condition is satisfied, then 


£y. a —B 
C 


Solution 11. 


The equations to be solved are almost trivial in this case. The problem is, 
given (a, 8), to find (y, 6) so that 


ay=1 and ad + B — 0. 


The first equation has a solution if and only if a Z 0, and, if that is so, then 
the second equation is solvable also. Conclusion: (a, 8) is invertible if and 
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only if a Z 0, and, if so, then 


1 £ 
-1_/2 BM 
(«7 = (4-2) 
Caution: this multiplication is not commutative, and the preceding 
computation guarantees a right inverse only. Does it work on the left too? 


Check it: 
1 B 3 1 1 B 
€ E) x i ) (3) 'a 3 


Solution 12. 


It is time to abandon the quadruple notation and the symbol x; from now 


write 
a B 
Pp A 


instead of (a, B, y, 6) and indicate multiplication by juxtaposition (placing 
the two symbols next to one another) instead of by x. The problem is, given 


a matrix 
a B 
y êj’ 


to determine whether or not there exists a matrix 


a! B' 

" s) 
a B\fa@ B'N f1 0 
Gly #)=( a): 


What is asked for is a solution of four equations in four unknowns. The 
standard solution techniques are easy enough to apply, but they are, of 
course, rather boring. There is no help for it, for the present; an elegant 
general context into which all this fits will become visible only after some of 


B 


the theory of linear algebra becomes known. The answer is that E 6 
is invertible if and only if ad — Gy Æ 0, and, if that is so, then 


D-EN- T 
35 se * $ aP 5 a 


such that 
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Readers reluctant to derive the result stand to gain something by at least 
checking it, that is by carrying out the two multiplications 


a B a! p 
y 6 ) ( y ó 
o p'ifo B 
y 6! ( y 6 , 


and noting that they yield the same answer, namely 
1 0 
0 17^ 


Comment. The present result applies, in particular, to the special matri- 


ces E 4 ) , Which are, except for notation, the same as the complex 
numbers discussed in Problem 4. It follows that such a special matrix is 
invertible if and only if aa — 8(—8) # 0—which is of course the same 
condition as o? + 8? # 0. (The awkward form is intended to serve as a 


reminder of how it arose this time.) If that condition is satisfied, then the 


inverse is 
à = 
( o2 B? zs ) 
a > 
ae +, o2 B? 


and that is exactly the matrix that corresponds to the complex number 


and 


a -B 

in perfect harmony with Solution 10. 
a B 
0 1 
transformations (a, 3) discussed in Problem 5. According to the present 
result such a special matrix is invertible if and only if aœ - 1 — 8- 0 Z 0, and 
in that case the inverse is (1, — £), in perfect harmony with Solution 11. 

It is a consequence of these comments that not only is Problem 6 a 
generalization of Problems 4 and 5, but, correspondingly, Solution 12 has 
Solutions 10 and 11 as special cases. 


Similarly the special matrices ) are the same as the affine 


Solution 13. 


(a) The verification that min is both commutative and associative is straight- 
forward. If anything goes wrong, it must have to do with the existence of a 
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neutral element, an identity element, that plays the role of 0. The question 
is this: does there exist a positive real number z such that 


min(z, z) =z 


for every positive real number z? The equation demands that z be greater 
than or equal to every positive real number z—in other words that z be 
"the largest real number". That's nonsense—there is no such thing; the 
present candidate fails to be a group. 

(b) The verification of commutativity and associativity is easy again. 
The search for 0 this time amounts to the search for a number z in the set 
{1, 2,3, 4,5} with the property that 


max(z, z) = x 


for every number z in the set. The equation demands that z be less than or 
equal to every positive integer between 1 and 5, and that’s easy; the number 
1 does the job. It remains to look for inverses. Given x, can we find y so 
that max(z,y) = 1? No—that’s impossible—the equation can never be 
satisfied unless x = y = 1. 

(c) Given that z + y = y, add (—y) to both sides of the equation. The 
right side becomes 0, and the left side becomes 


(x+y) + (-y) »z (y+(-y)) =2+0=2, 


and, consequently, z = 0. 


Comment. What went wrong in (a) was caused by the non-existence of a 
largest positive real number. What happens if R; is replaced by a bounded 
set of positive real numbers, such as the closed unit interval [0, 1]? Does 
the operation min produce a group then? Commutativity, associativity, and 
the existence of a zero element are satisfied (the role of 0 being played by 
1); the question is about inverses. Is it true that to every number z in [0, 1] 
there corresponds a number y in [0, 1] such that min(z, y) — 1? Certainly 
not; that can happen only if z = 1. 

Does the argument for (c) use the commutativity of --? Associativity? 
Both the defining properties of 0? 


Solution 14. 


The set of those affine transformations 


$e o6 tB 
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(discussed in Problem 5) for which a Z 0 does not have the first of the 
defining properties of abelian groups (commutativity), but it has all the 
others (the associative law, the existence of an identity element, and the 
existence of an inverse for every element)—see Problem 11; it is a group. 

The set of invertible 2 x 2 matrices is not commutative, but has the 
other properties of abelian groups (see Problem 12); it is a group. 

The product 2 x 3 is equal to 0 modulo 6. That is: multiplication mod- 
ulo 6 is not defined in the domain in question, or, in other words, the set 
{1, 2,3, 4, 5) is not closed under the operation. Conclusion: the non-zero 
integers modulo 6 do not form a multiplicative group. 

If a is any one of the numbers 1, 2, 3, 4, 5, 6 what can be said about 
the numbers 


axl,ax2,ax3,ax4,ax5,ax6 


(multiplication modulo 7)? First answer: none of them is 0 (modulo 7). 
(Why? This is important, and it requires a moment’s thought.) Second (as 
aconsequence of the first): they are all different. (Why?) Third (as a conse- 
quence of the second): except possibly for the order in which they appear, 
they are the same as the numbers 1, 2, 3, 4, 5, 6, and therefore, in particu- 
lar, one of them is 1. That is: for each number o there is a number 8 such 
that a x 8 = 1: this is exactly the assertion that every a has a multiplicative 
inverse. Conclusion: the non-zero integers modulo 7 form a multiplicative 


group. 


Solution 15. 


If there are only two distinct elements, an identity element 1 and another 
one, say a, then the “multiplication table” for the operation looks like 


0 1a 
1 1 a 
a|la ? 


If the question mark is replaced by 1, the operation is associative; if it is 
replaced by a, then the element a has no inverse. Conclusion: two elements 
are not enough to provide a counterexample. 
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If there are three distinct elements, an identity 1, and two others, a 
and £8, then there is more elbow room, and, for instance, one possibility is 


No matter what x and y are (among 1, a, and 8) the operation that the 
table defines has an identity and every element has an inverse. If x = 6 
and y = a, the result is associative, so that it does not serve as an example 
of the sort of thing wanted. If, however, x = a, then 


(a2)8 = ag =1 


and 
a(aZ) =al =q, 


so that the operation is not associative (and the same desired negative con- 
clusion follows if y = 8). 


Solution 16. 


Yes, everything is fine, multiplication in a field must be commutative, and, 
in particular, 0 - x = x -0 = 0 for every z, but it’s a good idea to look at the 
sort of thing that can go wrong if not both distributive laws are assumed. 
Question: if F is an abelian group with +, and if F* is an abelian group with 
x, and if the distributive law 


a(z +y) = az + ay 


is true for all a, x and y, does it follow that multiplication in F is commu- 
tative? Answer: no. Here is an artificial but illuminating example. 

Let F be the set of two integers 0 and 1 with addition defined modulo 
2, and with multiplication defined so that x - 0 = 0 for all x (that is, for 
x = 0 and for x = 1) and z: 1 = 1 for all x. (Recall that in addition 
modulo 12 multiples of 12 are discarded; in addition modulo 2 multiples 
of 2 are discarded. The only thing peculiar about addition modulo 2 is that 
1+1 = 0.) It is clear that F with + is an abelian group, and it is even clearer 
that F* (which consists of the single element 1) with x is an abelian group. 
The distributive law 


a(x + y) = oz + ay 
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is true; to prove it, just examine the small finite number of possible cases. 
On the other hand the distributive law 


(a+ B)z = ox + Bx 
is not true; indeed 
(0+1)-1=1 
and 
0-1+1-1=14+1=0. 


Irrelevant side remark: the associative law a(@y) = (aß)y is true— 
straightforward verification. The commutative law is false, by definition: 
0-1=1and1-0=0. 

If, however, both distributive laws are assumed, in other words, if the 
system under consideration is a bona fide field, then all is well. Indeed, 
since 

(0+1)4=0-xcx+1-2 
for all x, and since the left side of this equation is x whereas the right side 
is 
0-z-cz, 
it follows (from Problem 1) that 
0.2—0 
for all z. A similar use of the other distributive law, 
z(0+1)=2-0+2-1, 
implies that 
z:0-20 


for all z. In other words, every product that contains 0 as a factor is equal 
to 0, and that implies everything that's wanted, and it implies, in particular, 
that multiplication is both associative and commutative. 


Solution 17. 


(a) It is to be proved that 0 x a acts the way 0 does, so that what must be 
shown is that 0 x o added to any 8 yields 8. It must in particular be true that 
(0 x a) +a = a (= 0 + a), and, in fact, that's enough: if that is true then 
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the additive cancellation law implies that 0 x a = 0. The proof therefore 
can be settled by the following steps: 


(0xa)+a=(0xa)+(1 xa) (because 1 is the multiplicative unit) 
=(0+1)xa_ (by the distributive law) 
=1xa_ (because 0 is the additive unit) 
=a. 
(b) It is to be proved that (—1)a acts the way —a does, so that what 
must be shown is that a + (—1)o = 0. Proof: 
a+(-l)a = (1x a) + ((-1) x a) = (14+ (-1)) xa =0 xa =0. 


(c) It helps to know “half” of the asserted equation, namely 


(-0)8 = -(a), 
and the other, similar, half 


a(—B) = —(a8). 
The first half is true because 
af + (-a)8 = (a+(—a))8 (distributive law) 
=0xß=0, 
which shows that (—o)8 indeed acts just the way — (aß) is supposed to. 


The other half is proved similarly. The proof of the main assertion is now 
an easy two step deduction: 


(—a)(—8) = - (e(-8)) = - (-(a8)) = af. 


(d) This is not always true. Counterexample: integers modulo 2. (See 
Problem 18.) 

(e) By definition the non-zero elements of F constitute a multiplicative 
group, which says, in particular, that the product of two of them is again 
one of them. 


Solution 18. 


The answer is yes. The example illustrates the possible failure of the dis- 
tributive law and hence emphasizes the essential role of that law. 

Let F be (0, 1,2, 3,4}, with + being addition modulo 5 and x, being 
multiplication modulo 5. In this case all is well; (F, +, x1) isa field. 
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An efficient way of defining a suitable x? is by a multiplication table, 
as follows: 


A verbal description of the multiplication of the elements 2, 3, and 4 is 
this: the product of two distinct ones among them is the third. Compare 
Problem 8. The distributive law does indeed break down: 


2x24(34-4)22x252-1, 
but 


(2 x2 3) + (2x24) 2473-2. 


Comment. This is far from the only solution. To get another one, let F 
be {0, 1} with + being addition and x, being multiplication modulo 2; in 
this case (F, +, x1) is a field. If, on the other hand, x» is defined by the 
ridiculous equation 


ax28=1 
for all a and £, then 
1x2(1+1)=1 
but 


(1x21) +(1 x21) 2141-0. 


Solution 19. 


The answer is yes, there does exist a field with four elements, but the proof 
is not obvious. An intelligent and illuminating approach is to study the 
set P of all polynomials with coefficients in a field and “reduce” that set 
“modulo” some particular polynomial, the same way as the set Z of integers 
is reduced modulo a prime number p to yield the field Zp. 
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Logically, the right coefficient field to start with for the purpose at 
hand is Z2, but to get used to the procedure it is wise to begin with a more 
familiar situation, which is not directly relevant. 

Let P be the set of all polynomials with coefficients in the field Q of 
rational numbers, and let p be the particular polynomial defined by 


p(x) = 2? — 2. 


Important observation: the polynomial p is irreducible. That means non- 
factorable, or, more precisely, it means that if p is the product of two poly- 
nomials with coefficients in Q, then one of them must be a constant. 

Let F be the result of “reducing P modulo p”. A quick way of explain- 
ing what that means is to say that the elements of F are the same as the 
elements of P (polynomials with rational coefficients), but the concept of 
equality is redefined: for present purposes two polynomials f and g are to 
be regarded as the same if they differ from one another only by a multiple 
of p. The customary symbol for “equality except possibly for a multiple of 
p" is =, and the relation it denotes is called congruence. In more detail: to 
say that f is congruent to g modulo p, in symbols 


f = g modulo p, 
means that there exists a polynomial q (with rational coefficients) such that 
f-9= p94. 
What happens to the “arithmetic” of polynomials when equality is inter- 
preted modulo p? That is: what can be said about sums and products mod- 
ulo p? 


As far as the addition of polynomials of degree 0 and degree 1 is con- 
cerned, nothing much happens: 


(az + B) + (yz +6) = (a+ y)z ^- (8 4- 6), 


just as it should be. When polynomials of degree 2 or more enter the pic- 
ture, however, something new happens. Example: if 


f(z)-z^ and = g(x) = -2, 
then 
f(x) + g(x) = 0 (modulo p). 
Reason: f + g is a multiple of p (namely p - 1) and therefore 


(f +g) - 0 = 0 modulo p. 
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Once that is accepted, then even multiplication offers no new sur- 
prises. If, for instance, 


f(z) = 9(z) =2, 
then 


f -g = 2 (modulo p); 


indeed, f : g — 2 = p. 

What does a polynomial look like, modulo p? Since z? can always be 
replaced by 2 (is “equal” to 2), and, consequently, z? (= 22?) can be re- 
placed by 2z, and z* (= 2 z?) can be replaced by 4, etc., it follows that 
every polynomial is “equal” to a polynomial of degree 0 or 1. Once that is 
agreed to, it follows with almost no pain that F is a field. Indeed, the veri- 
fication that F with addition (modulo p) is an abelian group takes nothing 
but a modicum of careful thinking about the definitions. The same state- 
ment about the set of non-zero elements of F with multiplication (modulo 
p) takes a little more thought: where do inverses come from? The clue to 
the answer is in the following computation: 


1 a — Bx 


a+Bxr a? — 262° 

Familiar? Of course it is: it is the same computation as the rationalization 
of the denominator that was needed to prove that Q(4/2) is a field. All 
the hard work is done; the distributive laws give no trouble, and the happy 
conclusion is that F is a field, and, in fact, except for notation it is the same 
as the field Q(4/2). 

The same technique can be applied to many other coefficient fields 
and many other moduli. Consider, to be specific, the field Zo, and let P 
this time be the set of all polynomials 


ag + 04 + a2? +++: aa" 


of all possible degrees, with coefficients in Z2. (Caution: 5x + 3 means 


(e+e+e2+e+2)+(14+1+41); 


it is a polynomial, and it is equal to z + 1 modulo 2. It is dangerous to jump 
to the conclusion that the polynomial z? + z?, which means zzzzz + zzz, 
can be reduced similarly.) The set P of all such polynomials is an abelian 
group with respect to addition (modulo 2, of course); thus, for example, 
the sum of 


r+ +r+1 
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and 


r? +r? +r 


+r? +1. 
Polynomials admit a natural commutative multiplication also (example: 
(a? 4- 1)2? +z = z? +2), 


with a unit (the constant polynomial 1), and addition and multiplication 
together satisfy the distributive laws. Not all is well, however; multiplica- 
tive inverses cause trouble. Example: there is no polynomial f such that 
zf(r) = 1; the polynomial z (different from 0) has no reciprocal. In this 
respect polynomials behave as the integers do: the reciprocal of an integer 
n is not an integer (unless n — 1 or n — —1). Just as for integers, reduction 
by a suitable modulus can cure the disease. A pertinent modulus for the 
present problem is x? + z + 1. 

Why is it pertinent? Because reduction modulo a polynomial of de- 
gree k, say, converts every polynomial into one of degree less than k, and 
modulo 2 there are, for each k, exactly 2* polynomials of degree less than 
k. That's clear, isn't it?—to determine a polynomial of degree k — 1 or 
less, the number of coefficients that has to be specified is k, and there are 
two choices, namely 0 and 1, for each coefficient. If we want to end up 
with exactly four polynomials that constitute a field with four elements, 
the value of k must therefore be 2. Modulo 2 the four polynomials of de- 
gree less than 2 are 0, 1, z, and z + 1. Just as the modulus by which the 
integers must be reduced to get a field must be a prime—an unfactorable, 
irreducible number—the modulus by which the polynomials must be re- 
duced here should be an unfactorable, irreducible polynomial. Modulo 2 
there are exactly four polynomials of degree exactly 2, namely the result of 
adding one of 0, 1, z, or z + 1 to z?. Three of those, namely 


z?-—z:z, 


z?-1- (z- l)(z 4 1), 
and 
z?-4z-—zc(r-41) 


are factorable; the only irreducible polynomial of degree 2 is z? + z + 1. 
The reduced objects, the four polynomials 


0,1,2,2+1 


204 LINEAR ALGEBRA PROBLEM BOOK 


are added (modulo 2) the obvious way; the modulus does not enter. It does 
enter into multiplication. Thus, for instance, to multiply modulo z? 4-z 4-1, 
first multiply the usual obvious way and then throw away multiples of 
x? +2 + 1. Example: z? = 1 (modulo z? + z + 1). Reason: 
2? = z(27) = z((z? - z +1) + (z +1)) = z(£ +1) 
=g? +r= (r +r+1)+1=1. 
The multiplication table looks like this: 


The inspiration is now over; what remains is routine verification. The result 
is that with addition and multiplication as described the four polynomials 
0, 1, z, x + 1 do indeed form a field. 

To construct a field with nine elements, proceed similarly: use polyno- 
mials with coefficients in the field of integers modulo 3 and reduce modulo 
the polynomial x3 + 2x + 2. 

Is there a field with six elements? The answer is no. The proof depends 
on a part of vector space theory that will be treated later, and the fact itself 
has no contact with the subject of this book. The general theorem is that 
the number of elements in a finite field is always a power of a prime, and 
that for every prime power there is one (and except for change of notation 
only one) finite field with that many elements. 


Chapter 2. Vectors 


Solution 20. 


The scalar zero law is a consequence of the other conditions; here is how 
the simple proof goes. If z is in V, then 


Or+02 —(0--0)r (by the vector distributive law) 
= x, 


and therefore, simply by cancellation in the additive group V, the forced 
conclusion is that 0z = 0. 
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As for the vector zero law, the scalar distributive law implies that o0 
is always zero. Indeed: 


o0 + a0 = a(0 + 0) = a0, 


and therefore, simply by cancellation in the additive group V, the forced 
conclusion is that a0 = 0. 

It is good to know that these two results about 0 are in a sense best 
possible. That is: if az = 0, then either a = 0 or x = 0. Reason: if oz = 0 
and a Æ 0, then 


gedr- (ža) r= G) (ox) (bythe associative law), 
Q 


(9e 


Comment. Ifa scalar multiplication satisfies all the conditions in the def- 
inition of a vector space, how likely is it that oz = z? That happens when 
x = 0 and it happens when a = 1; can it happen any other way? The an- 
swer is no, and, by now, the proof is easy: if ax = z, then (a — 1)z = 0, 
and therefore either o — 1 = 0 or z = 0. 

A pertinent comment is that every field is a vector space over itself. 
Isn't that obvious? All it says is that if, given F, and if the space V is de- 
fined to be F itself, with addition in V being what it was in F and scalar 
multiplication being ordinary multiplication in F, then the conditions in 
the definition of a vector space are automatically satisfied. Consequence: 
if F is a field, then the equation 0a = 0 in F is an instance of the scalar 
zero law. In other words, the solution of Problem 17 (a) is a special case of 
the present one. 


which implies that 


Solution 21. 
(1) The scalar distributive law fails: indeed 
2«12522.1- 4, 
but 
l*14+ 1*41=1-14+1-1=2. 


The verifications that all the other axioms of a vector space are satisfied 
are painless routine. 
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(2) The scalar identity law fails; all other conditions are satisfied. 

(3) Since the mapping a ++ o? is multiplicative ((a@8)? = a7), the 
associative law for the new scalar product is true (this should be checked, 
and it is fun to check). The new scalar identity law follows from the fact 
that 1? = 1. The verification of the new scalar distributive law depends 
on the fact that if a and £ are scalars (in the present sense, a very special 
case), then 


(a+ B? =a? + 8?. 


(That identity holds, in fact, if and only if the field has “characteristic 2”, 
which means that a + a = 0 for every o in F. An equivalent way of ex- 
pressing that condition is just to say that 2 = 0, where “2” means 1 + 1, of 
course.) The scalar distributive law, however, is false. Indeed: 


1((1,0) + (0,1)) = 1(1, 1) = (1,1), 


whereas 


1(1,0) + 1(0, 1) = (1 + 1,0) + (0,1) = (1 + 1,1). 


(4) Nothing is missing; the definitions of and [-]do indeed make 
R+ into a real vector space. 
(5) In this example the associative law fails. Indeed, if a = 8 = i, then 


(af) : 1 = (-1)1 = -1, 
whereas 
a-(8-1) 20-(0) =0. 


The verifications of the distributive laws (vector or scalar), and of the scalar 
identity law, are completely straightforward; all that they depend on (in 
addition to the elementary properties of the addition of complex numbers) 
is that Re does the right thing with 0, 1, and +. (The right thing is ReO = 0, 
Re1 = 1, and Re(a + 8) = Reo + Re f.) 

(6) Here, once more, nothing is missing. The result is a special case of 
the general observation that if F is a field and G is a subfield, then F is a 
vector space over G. 


Question. What is the status of the zero laws (scalar and vector) in these 
examples? The proof that they held (Problem 20) depended on the truth 
of the other conditions; does the failure of some of those conditions make 
the zero laws fail also? 
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Comment. Examples (1), (2), (3), and (5) show that the definition of vec- 
tor spaces by four axioms contains no redundant information. A priori it 
is conceivable that some cleverly selected subset of those conditions (con- 
sisting of three, or two, or even only one) might be strong enough to imply 
the others. There are 15 non-empty subsets, and a detailed study of all pos- 
sibilities threatens to be more than a little dull. An examination of some of 
those possibilities can, however, be helpful in coming to understand some 
of the subtleties of the algebra of scalars and vectors, and that’s what the ex- 
amples (1), (2), (3), and (5) have provided. Each of them shows that some 
particular one of the four conditions is independent of the other three: 
they provide concrete counterexamples (of F, V, and a scalar multiplica- 
tion defined between them) in which three conditions hold and the fourth 
fails. 

Despite example (5), the associative law is almost a consequence of the 
others. If, to be specific, the underlying field is Q, and if V is a candidate for 
a vector space over Q, equipped with a scalar multiplication that satisfies 
the two distributive laws and the scalar identity law, then it satisfies all the 
other conditions, and, in particular, it satisfies the associative law also, so 
that V is an honest vector space over Q. The proof is not especially difficult, 
but it is of not much use in linear algebra; what follows is just a series of 
hints. 

The first step might be to prove that 2z is necessarily equal to x 4-z, and 
that, more generally, for each positive integer m, the scalar product mz is 
the sum of m summands all equal to x. This much already guarantees that 
(a8)z = a(x) whenever a and B are positive integers. To get the general 
associative law two more steps are necessary. One: recall that 0- z = 0 and 
(—1)x = —z (compare the corresponding discussions of the status of the 
other vector space axioms)—this yields the associative law for all integers. 
Two: iz + iz = xq, and, more generally, the sum of n summands all equal 
to 42 is equal to z—this yields the associative law for all reciprocals of 
integers. Since every rational number has the form m - 4, where m and n 
are integers, the associative law follows for all elements of Q. Caution: the 
reader who wishes to flesh out this skeletal outline should be quite sure 
that the lemmas needed (for example (—1)z = —2z) can be proved without 
the use of the associative law. 

A similar argument can be used to show that if the underlying field is 
the field of integers modulo a prime p, then, again, the associative law is 
a consequence of the others. These facts indicate that for a proof of the 
independence of the associative law the field has to be more complicated 
than Q or Z,. (Reminder: fields such as Z, occurred in the discussion pre- 


208 LINEAR ALGEBRA PROBLEM BOOK 


ceding Problem 19.) A field that is complicated enough is the field C of 
complex numbers—that’s what the counterexample (5) shows. 


Solution 22. 
It’s easy enough to verify that 
3(1, 1) — 1(1,2) = (2,1) 
and 
—1(1, 1) + 1(1, 2) = (0, 1), 


so that (2, 1) and (0, 1) are indeed linear combinations of (1, 1) and (1, 2), 
but these equations don’t reveal any secrets; the problem is where do they 
come from—how can they be discovered? 

The general question is this: for which vectors (a, 3) can real numbers 
€ and 7 be found so that 


ec, 1) + n(1, 2) = (a, p)? 


In terms of coordinates this vector equation amounts to two numerical 
equations: 


E+n=a 
€+2n= p. 


To find the unknowns € and 7, subtract the top equation from the bottom 
one to get 


n=B-a, 
and then substitute the result back in the top equation to get 
€+B-a=a, 
or, in other words, 
€ — 20 — a. 


That's where the unknown coefficients come from, and, once derived, the 
consequence is easy enough to check: 


(2a — 8)(1,1) + (8 — a)(1, 2) = (2a — B+ 8B — a, 2a — B+ 28 — 2a) 
= (a, B). 


Conclusion: every vector in R? is a linear combination of (1, 1) and (1, 2). 
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The process of solving two linear equations in two unknowns (elim- 
inate one of the unknowns and then substitute) is itself a part of linear 
algebra. It is used here without any preliminary explanation because it is 
almost self-explanatory and most students learn it early. (Incidentally: in 
this context the phrase linear equations means equations of first degree, 
that is, typically, equations of the form 


af + Bn+7=0 
in the two unknowns £ and 7.) 


Solution 23. 


For (a) the sets described by (1), (2), and (4) are subspaces and the sets 
described by (3), (5), and (6) are not. The proofs of the positive answers 
are straightforward applications of the definition; the negative answers de- 
serve at least a brief second look. 

(3) The vector 0 (— (0,0, 0)) does not satisfy the condition. 

(5) The vector (1,1,1) satisfies the condition, but its product by 7 
(= V—1) does not. 

(6) The vector (1, 1, 1) satisfies the condition, but its product by 7 does 
not. 

For (b) the sets described by (2) and (4) are subspaces and the sets 
described by (1) and (3) are not. The proofs of the positive answers are 
straightforward. For the negative answers: 

(1) The polynomials z? + z and —2? + 2 satisfy the condition, but their 
sum does not. 

(3) The polynomial z? satisfies the condition, but its product by i 


(= V—1) does not. 


Comment. The answers (5) for (a) and (3) for (b) show that the sets M in- 
volved are not subspaces of the complex vector spaces involved—but what 
would happen if C? in (a) were replaced by R?, and, similarly, the com- 
plex vector space P in (b) were replaced by the corresponding real vector 
space? Answer: the results would stay the same (negative): just replace 7" 
by *—17". 


Solution 24. 


(a) The intersection of any collection of subspaces is always a subspace. 
The proof is just a matter of language: it is contained in the meaning of 
the word "intersection". Suppose, indeed, that the subspaces forming a 
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collection are distinguished by the use of an index ~y; the problem is to 
prove that if each M, is a subspace, then the same is true of M = (^), My. 
Since every M., contains 0, so does M, and therefore M is not empty. If x 
and y belong to M (that is to every M), then az + By belongs to every 
ML, (no matter what o and £ are), and therefore ax + By belongs to M. 
Conclusion: M is a subspace. 

(b) If one of two given subspaces is the entire vector space V, then 
their union is V; the question is worth considering for proper subspaces 
only. If M; and M; are proper subspaces, can M; U Mp be equal to V? No, 
never. If one of the subspaces includes the other, then their union is equal 
to the larger one, which is not equal to V. If neither includes the other, the 
reasoning is slightly more subtle; here is how it goes. 

Consider a vector x in M; that is not in Mp, and consider a vector y 
that is not in M; (it doesn't matter whether it is in Mi or not). The set of 
all scalar multiples of z, that is the set of all vectors of the form az, is a line 
through the origin. (The geometric language doesn't have to be used, but it 
helps.) Translate that line by the vector y, that is, form the set of all vectors 
of the form oz + y; the result is a parallel line (not through the origin). 
Being parallel, the translated line has no vectors in common with Mı. (To 
see the geometry, draw a picture; to understand the algebra, write down a 
precise proof that ax + y can never be in M;.) How many vectors can the 
translated line have in common with Mz? Answer: at most one. Reason: if 
both oz 4-y and £z +y are in Mg, witha Z 2, then their difference (a—()x 
would be in Mg, and division by o — 8 would yield a contradiction. It is a 
consequence of these facts that the set L of all vectors of the form oz + y 
(a line) has at most one element in common with M; U Mi. Since there are 
as many vectors in L as there are scalars (and that means at least two), it 
follows that M; U Mz cannot contain every vector in V. 

Granted that V cannot be the union of two proper subspaces, how 
about three? As an example of the sort of thing that can happen, consider 
the field F of integers modulo 2; the set F? of all ordered pairs of elements 
of F is a vector space in the usual way. The subset 


((0,0), (0, 1) 


is a subspace of F?, and so are the subsets 


{(0, 0), (1,0) } 


and 


{(0,0), (1, 0]. 
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The set-theoretic union of these three subspaces is all of F?; this is an ex- 
ample of a vector space that is the union of three proper subspaces of itself. 
The example looks degenerate, in a sense: the vector space has only a fi- 
nite number of vectors in it, and it should come as no surprise that it can 
be the union of a finite number of proper subspaces. Every vector space 
is the union of its “lines”, and in the cases under consideration there are 
only a finite number of them. 

Under these circumstances, the intelligent thing to do is to ask about 
infinite fields, and, sure enough, it turns out that a vector space over an 
infinite field is never the union of a finite number of proper subspaces; the 
proof is just a slight modification of the one that worked for n = 2 and all 
fields (infinite or not). 

Suppose, indeed, that M,,..., M, are proper subspaces such that 
none of them is included in the union of the others. From the present point 
of view that assumption involves no loss of generality; if one of them is 
included in the union of the others, just omit it, and note that the only 
effect of the omission is to reduce the number n to n — 1. It follows that 
there exists a vector zı in M that does not belong to M; for j 4 1, and 
(since M; is not the whole space) there exists a vector zo that does not 
belong to Mj. 

Consider the line through zo parallel to z1. Precisely: let L be the set of 
all vectors of the form zo--az (o a scalar). How large can the intersections 
L N Mj; be (where j = 1,...,n)? Since zi belongs to M; it follows that 
Zo + az, cannot belong to M, (for otherwise zo would also); this proves 
that L N M; = 2. As for the sets LM M; with j # 1, they can contain no 
more than one vector each. Reason: if both zo + az, and zo + 9x4 belong 
to M;, then so does their difference, (o — 8)z;, and, since x is not in M;, 
that can happen only when a = f. 

Since (by hypothesis) there are infinitely many scalars, the line L con- 
tains infinitely many vectors. Since, however, by the preceding paragraph, 
the number of elements in L N (M; U --- U Ma) is less than n, it follows 
that M, U --- U M, cannot cover the whole space; the proof is complete. 

What the argument depends on is a comparison between the cardinal 
number of the ground field and a prescribed cardinal number n. Related 
theorems are true for certain related structures. One example: a group 
is never the union of two proper subgroups. Another example: a Banach 
space is never the union of a finite or countable collection of closed proper 
subspaces. 


Caution. Even if the ground field is uncountable (has cardinal number 
greater than No, as does R for instance), it is possible for a vector space 
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to be the union of a countably infinite collection of proper subspaces. Ex- 
ample: the vector space P of all real polynomials is the union of the sub- 
spaces P,, consisting of all polynomials of degree less than or equal to n, 
n = | VAR eee 


Solution 25. 


(a) Sure, that’s easy; just consider, for instance, the sets {(1, 0), (0, 1)) and 
((2, 0), (0, 2)). That answers the question, but it seems dishonest—could 
a positive answer have been obtained so that no vector in either set is a 
scalar multiple of a vector in the other set? Yes, and that's easy too, but 
it requires a few more seconds of thought. One example is ((1,0), (0, 1)) 
and ((1, 1), (1, —1)). 

(b) The span of ((1, 1, 1), (0, 1, 1), (0,0, 1)) is R?, or, in other words, 
every vector in R? is a linear combination of the three vectors in the set. 

Why? Because no matter what vector (a, 8, y) is prescribed, coeffi- 
cients £, 7, and ¢ can be found so that 


£(1,1, 1) + (0,1, 1) + ¢(0, 0, 1) = (a, 8,7). 
In fact this one vector equation says the same thing as the three scalar 
equations 
=a, 
$tn0-8, 
Etnt+c=%, 


and those are easy equations to solve. The solution is 


=a, 
n=B-€=68-a, 
C=y7-€-n=7-a-(8-a)=7-8. 
Check: 


a(1,1,1) + (8 — a)(0, 1, 1) + (y — 8)(0,0, 1) = (a, 8, 7). 


Comment, The span of the two vectors (0, 1, 1) and (0, 0, 1) is the set of 
all (0, €,€ +7), which is in fact the (7, ¢)-plane. The span of the two vectors 
(1, 1, 1) and (0, 1, 1) is the plane consisting of the set of all (€, £ +n, € +n), 
and the span of (1, 1, 1) and (0, 0, 1) is still another plane. 
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Solution 26. 
Yes, it follows. To say that z € V(M, y} means that there exists a vector z 
in M and there exist scalars o and £ such that 

z = ay + Bz. 


It follows, of course, that 


ay = z — fz, 


and, moreover, that a # 0—the latter because otherwise x would belong 
to M, contradicting the assumption. Conclusion: 


y € VM. z}, 
and that implies the equality of the spans of (M, z) and V(M, y}. 


Solution 27. 


(a) No, there is no vector that spans R?. Indeed, for each vector (z, y) in 
R?, its span is the set of all scalar multiples of it, and that can never contain 
every vector. Reason: if z — 0, then (1,0) is not a multiple of (z, y), and if 
x Æ 0, then (x,y + 1) is not a multiple of (x, y). 
(b) Yes, there are two vectors that span R?, many ways. One obvious 
example is (1, 0) and (0, 1); another is (1, 1) and (1, —1)—see Problem 25. 
(c) No, no two vectors can span R?. Suppose, indeed, that 


£ = (21,22,23) and y = (y1, Y2: Y3) 


are any two vectors in R?; the question is whether for an arbitrary z = 
(21, 22, 23) coefficients a and B can be found so that az + Gy = z. In other 
words, for given (£1, £2, £3) and (y1, y», y3) can the equations 


azı + By = 21, 
O2 + By2 = 22, 
ax3 + By3 = 23, 


be solved for the unknowns o and 8, no matter what z,, z2, and z3 are? The 
negative answer can be proved either by patiently waiting till the present 
discussion of linear algebra reaches the pertinent discussion of dimension 
theory, or by making use of known facts about the solution of three equa- 
tions in two unknowns (which belongs to the more general context of sys- 
tems with more equations than unknowns). In geometric language the facts 
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can be expressed by saying that all linear combinations of z and y are con- 
tained in a single plane. 

(d) No, no finite set of vectors spans the vector space P of all poly- 
nomials (no matter what the underlying coefficient field is). The reason 
is that polynomials have degrees. In a finite set of polynomials there is 
one with maximum degree; no linear combination of the set will produce 
a polynomial with greater degree than that. Since P contains polynomials 
of all degrees, the span of the finite set cannot exhaust P. Compare the 
cautionary comment at the end of Solution 24. 


Solution 28. 


The modular identity does hold for subspaces. 

The easy direction is D: the right side is included in the left. Reason: 
LOM c L (obviously) and LN N c M+ (LN). In other words, both 
summands on the right are included in the left, and, therefore, so is their 
sum. 

The reverse direction takes a little more insight. If z is a vector in the 
left side, then x € L and z = y + z with y € M and z € L NN. Since 
y = z— z, and since —z belongs to LNN along with z, so that, in particular, 
—z € L, it follows that y € L. Since by the choice of notation, y € M, it 
follows that y € L N M, and hence that 


z € (LMM) + (Ln N), 


as promised. 


Solution 29. 


The question is when do addition and intersection satisfy the distributive 
law. Half the answer is obvious: the right side is included in the left. Rea- 
son: both LN M and LNN are included in both L and M+ N. 

As for the other half, if every vector in V is a scalar multiple of a par- 
ticular vector z, then V has very few subspaces—in fact, only two, O and V. 
In that case the distributive law for subspaces is obviously true; in all other 
cases it's false. 

Suppose, indeed, that V contains two vectors z and y such that neither 
one is a scalar multiple of the other. (Look at a picture in R?.) If L, M, and 
N are the sets of all scalar multiples of z + y, x, and y, respectively, then 
LNM and LNN are O, so that the right side is O, whereas M + N includes 
IL, so that the left side is L. 
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Solution 30. 


For most total sets E in a vector space V it is easy to find a subspace M that 
has nothing in common with E. For a specific example, let V be R? and let 
E be ((1,0), (0, 1)}; the subspace M spanned by (1, 1) is disjoint from E. 


Solution 31. 
The answers are yes, and the proofs are easy. 
If zo = 75 ,ajzj, then put ao = —1 and note that 577 052; = 0. 
Since not all the scalars o, 0, ... , Œn are 0 (because at least o is not), it 
follows that the enlarged set (zo, z1,..., 74) is dependent. 


In the converse direction, if yan a,x; = 0, with not every a; equal 
to 0, then there is at least one index i such that a; 4 0. Solve for z; to get 


t=} m (The symbol ^; ;; indicates the sum extended over the 
indices j different from i.) That's it: the last equation says that z; is a linear 
combination of the other z's. 

It is sometimes convenient to regard a finite set (o, 11, ..., Zn} of vec- 
tors as presented in order, the order of indices, and then to ask about the 
dependence of the initial segments (zo), (zo,z1), (zo, £1, £2}, etc. The 
proof given above yields the appropriate result. A more explicit statement 
is this corollary: a set (zo, z1,. .., £n } of non-zero vectors is dependent if 
and only if at least one of the vectors z1,..., Zp, is a linear combination of 
the preceding ones. The important word is “preceding”. The proof of “if” is 
trivial. The proof of “only if” is obtained from the second half of the proof 
given above by choosing z; to be the first vector after zo for which the set 
(21, ..., zi) is linearly dependent. (Caution: is it certain that there is such 
an 2,?) The desired result is obtained by solving such a linear dependence 
relation for z;. 


Solution 32. 


Yes, every finite-dimensional vector space has a finite basis; in fact, if E is 
a finite total set for V, then there exists an independent subset F of E that 
is a basis for V. The trick is to use Problem 31. 

If V — O, the result is trivial; there is no loss of generality in assum- 
ing that V 4 O. In that case suppose that E is a finite total set for V and 
begin by asking whether 0 belongs to E. If it does, discard it; the resulting 
set (which might as well be denoted by E again) is still total for V. If E is 
independent, there is nothing to do; in that case F — E. If E is dependent, 
then, by Problem 31, there exists an element of E that is a linear combi- 
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nation of the others. Discard that element, and note that the resulting set 
(which might as well be denoted by E again) is still total for V. Keep re- 
peating the argument of the preceding two sentences as long as necessary; 
since E is finite, the repetitions have to stop in a finite number of steps. 
The only thing that can stop them is arrival at an independent set, and that 
completes the proof. 


Chapter 3. Bases 


Solution 33. 


If T is a total set for a vector space V, and E is a finite independent set in 
V, then there exists a subset F of T, with the same number of elements as 
E such that (T — F) U Eis total. 

The proof is simplest in case E consists of a single non-zero vector 
x. All that has to be done then is to express x as a linear combination 
>>; aix; of vectors in T and find a coefficient o; different from 0. From 
z = E; ajy; it follows that y; = + (z — Du 059). If y; is discarded 
from T and replaced by z, the result is just as total as it was before, because 
each linear combination of vectors in T is equal to a linear combination of 
x and of vectors in T different from y;. 

In the general case, E = (21,..., 74), apply the result of the preceding 
paragraph inductively to one z at a time. Begin, that is, by finding y; in Tı 
(= T) so that T; = (Tı — {y1 }) U (21) is total. For the second step, find 
yo in Tz so that Ta = (T2 — {y2}) U {x2} is total, and take an additional 
minute to become convinced that T4 contains 71, that is that y2 couldn't 
have been zı. The reason for the latter is the assumed independence of 
the z's; if xı had been discarded from T», no linear combination of z2, 
together with the vectors that have not been discarded, could recapture it. 
Keep going the same way, forming 


Ti -1- (Tr — (9.)) U {ze}, 


till T,, is reached. The result is a new total set obtained from T by changing 
a subset F = (y1,..., Yn} of T into the prescribed set 


E = {z1,... Tn}. 


The name of the result is the Steinitz exchange theorem. 
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The result has three useful corollaries. 


Corollary 1. Jf Eis an independent set and T is a total set in a finite-dimen- 
sional vector space, then the number of elements in E is less than or equal to 
the number of elements in T. 


Corollary 2. Any two bases for a finite-dimensional vector space have the 
same number of elements. 


The dimension of a finite-dimensional vector space V, abbreviated 
dim V, is the number of elements in a basis of V. 


Corollary 3. Every set of more than n vectors in a vector space V of di- 
mension n is dependent. A set of n vectors in V is a basis if and only if it is 
independent, or, alternatively, if and only if it is total. 


Note that these considerations answer, in particular, a question asked 
long before (Problem 27), namely whether two vectors can span R?. Since 
dim R? — 3, the answer is no. 


Solution 34. 


If several subspaces of a space V of dimension n have a simultaneous com- 
plement, then they all have the same dimension, say m, so that that is at 
least a necessary condition. Assertion: if the coefficient field is infinite, then 
that condition is sufficient also: finite collections of subspaces of the same 
dimension m necessarily have simultaneous complements. 

If the common dimension m is equal to n, then each of the given sub- 
spaces is equal to V (is it fair in that case to speak of "several" subspaces?), 
and the subspace (0) is a simultaneous complement—a thoroughly unin- 
teresting degenerate case. If m < n, then the given subspaces M;,..., Mk 
are proper, and it follows from Problem 24 that there exists a vector z in V 
that doesn't belong to any of them. If L is the 1-dimensional space spanned 
by z, then M; N L = {0} for each j, and, moreover, all the subspaces 


M; +L,...,M,+L 


have dimension m + 1. Either m + 1 = n (in which case M; + L = V for 
each j, and, in fact, L is a simultaneous complement of all the M;'s), or 
m t 1 < n, in which case the reasoning can be applied again. Applying 
it inductively a total of n — m times produces the promised simultaneous 
complement. 
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The generalization of Problem 24 to uncountable ground fields and 
countable collections of proper subspaces is just as easy to apply as the 
ungeneralized version. Conclusion: if the ground field is uncountable, then 
countable collections of subspaces of the same dimension m necessarily 
have simultaneous complements. 


Solution 35. 


(a) If z and 1 are linearly dependent, then there exist rational numbers a 
and 8, not both 0, such that o - 1 + 6-€ = 0. The coefficient 8 cannot 
be 0 (for if it were, than o too would have to be), and, consequently, this 
dependence relation implies that z — —É. and hence that z is rational. 


The reverse implication is equally easy: z and 1 are linearly dependent if 
and only if z is rational. 

(b) The solution of two equations in two unknowns is involved, namely 
the equations 


o(1-- €) - B(1—€) - 0 
a(1—¢)+A(1 +) - 0 


in the unknowns a and £. If € 4 0, then o and 8 must be 0; the only case 
of linear dependence is the trivial one, (1, 1) and (1, 1). 


Solution 36. 


How about (x, 1,0), (1, z, 1), and (0, 1, z)? The assumption of linear de- 
pendence leads to three equations in three unknowns that form a conspir- 
acy: they imply that x(x? — 2) = 0. Consequence: x must be 0 or else 4/2, 
and, indeed, in each of those cases, linear dependence does take place. 
That makes sense for R, but not for Q; in that case linear dependence can 
take place only when z = 0. 


Solution 37. 


(a) If (1, œ) and (1, 8) are to be linearly independent, then clearly o cannot 
be equal to G, and, conversely, if a Z 8, then linear independence does 
take place. 

(b) No, there is not enough room in C? for three linearly independent 
vectors; the trouble is that three equations in two unknowns are quite likely 
to have a non-trivial solution. Better: C? has dimension 2, and the existence 
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of three linearly independent vectors would imply that the dimension is at 
least 3. 


Solution 38. 


Why not? 

For (a) consider, for instance, two independent vectors in C?, such as 
(1,0) and (1, —1), each of which is independent of (1, 1), and use them to 
doctor up the two given vectors. One possibility is to adjoin 


(0,0,1,0) and (1,0,0,0) 
to the first given pair and adjoin 
(0, 0,1,—1) and (1, —1,0, 0) 


to the second given pair. 
For (b), adjoin 


(0,0,1,0) and (0,0,1,1) 


to the first two vectors and adjoin 


(—1,1,0,0,) and (0,1,0,0) 


to the second two. 


Solution 39. 


(a) Never—there is too much room in C?. Better: since the dimension of 
C? is 3, two vectors can never constitute a basis in it. 

(b) Never—the sum of the first two is the third—they are linearly de- 
pendent. 


Solution 40. 


How many vectors can there be in a maximal linearly independent set? 
Clearly not more than 4, and it doesn’t take much work to realize that any 
four of the six prescribed vectors are linearly independent. Conclusion: the 
answer is the number of 4-element subsets of a 6-element set, that is (2). 
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Solution 41. 


If x is an arbitrary non-zero vector in V, then z and iz (= /—12) are 
linearly independent over R. (Reason: if œ and B are real numbers and if 


az + B (ix) — 0, 
then 
(a + Bi)z — 0, 


and since x # 0, it follows that a + Gi = 0.) Consequence: if the vec- 
tors 71, 72, 73,... constitute a basis in V, then the same vectors, together 
with their multiples by i, constitute a basis in V"**', Conclusion: the “real 
dimension" of V is 2n. Unsurprising corollary: the real dimension of C 
is 2. 


Solution 42. 


Suppose, more generally, that M and N are finite-dimensional subspaces 
of a vector space, with M C N. If M # N, then a basis for M cannot span 
N. Tàke a basis for M and adjoin to it a vector in N that is not in M. The 
result is a linearly independent set in N containing more elements than 
the dimension of M—which implies that M and N do not have the same 
dimension. Conclusion: if a subspace of N has the same dimension as N, 
then it must be equal to N. 


Solution 43. 


The answer is yes; every finite independent set in a finite-dimensional vec- 
tor space can be extended to a basis. The assertion (Problem 32) that in a 
finite-dimensional vector space there always exists a finite basis is a special 
case: it just says that the empty set (which is independent) can be extended 
to a basis. 

The proof of the general answer has only one small trap. Given a fi- 
nite independent set E, consider an arbitrary finite basis B, and apply the 
Steinitz exchange theorem (see Solution 33). The result is that there exists 
a total set that includes E and has the same number of elements as B; but 
is it obvious that that set must be independent? Yes, it is obvious. If it were 
dependent, then (see Problem 32) a proper subset of it would be a basis, 
contradicting the fact (Corollary 2 in Solution 33) that any two bases have 
the same number of elements. 
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Note that the result answers the sample question about the set (u, v} 
described before the statement of the problem: there does indeed exist a 
basis of C^ containing u and v. One such basis is {u, v, 2, £2}. 


Solution 44. 


If V is a vector space of dimension n, say, and if M is a subspace of V, then 
M is indeed finite-dimensional, and, in fact, the dimension of M must be 
less than or equal to n. If M = O, then the dimension of M is 0, and the 
proof is complete. If M contains a non-zero vector 21, let M; (C M) be 
the subspace spanned by z1. If M = Mi, then M has dimension 1, and the 
proof is complete. If M 4 Mi, let z2 be an element of M not contained in 
My, and let Mz be the subspace spanned by z; and z2; and so on. After no 
more than n steps the process reaches an end. Reason: the process yields 
an independent set, and no such set can have more than n elements (since 
every independent set can be extended to a basis, and no basis can have 
more than n elements). The only way the process can reach an end is by 
having the z's form a set that spans M—and the proof is complete. 


Solution 45. 


A total set is minimal if and only if it is independent. The most natural 
way to approach the proofs of the two implications involved seems to be 
by contrapositives. That is: E is not minimal if and only if it is dependent. 

Suppose, indeed, that E is not minimal, which means that E has a non- 
empty subset F such that the relative complement E — F is total. If x is 
any vector in F, then there exist vectors z,,...,z, in E — F and there exist 
scalars a1,...,@, such that 


"n 
r= 1 QjTj, 
j=1 


which implies, of course, that the subset {z, z1, . . - , £n } of E is dependent. 
If, in reverse, E is dependent, then there exist vectors 71,..., £n in E 
and there exist scalars a1,...,@,, not all zero such that 


n 
) Qjrj =0. 
j=l 


Find i so that o; 4 0, and note that 
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This implies that the set F = E — {z;} is just as total as E, and hence that 
E is not minimal. 


Solution 46. 


If E is a total subset of a finite-dimensional vector space V, express each 
vector in a basis of V as a linear combination of vectors in E. The vectors 
actually used in all these linear combinations form a finite total subset of E. 
That subset has an independent subsubset with the same span (see Prob- 
lem 33), and, therefore, that subsubset is total. Since an independent total 
set is minimal, the reasoning proves the existence of a minimal total subset 
of E. 

The conclusion remains true for spaces that are not finite-dimensional, 
but at least a part of the technique has to be different. What’s needed, given 
E, is an independent subset of E with the same span. A quick way to get 
one is to consider the set of all independent subsets of E and to find among 
them a maximal one. (That’s the same technique as is used to prove the 
existence of bases.) The span of such a maximal independent subset of E 
has to be the same as the span of E (for any smaller span would contradict 
maximality). Since the span of E is V, that maximal independent subset is 
itself total. Since an independent total set is a minimal total set (Problem 
45), the proof is complete: every total set has minimal total subset. 


Solution 47. 


An infinitely total set E always has an infinite subset F such that E — F is 
total. Here is one way to construct an F. 

Consider an arbitrary vector x, in E. Since, by assumption, E — {z1} 
is total, there exists a finite subset E; of E — {2} whose span contains z4. 
Let zz be a vector in the relative complement E — ((z4) U E,). Since, by 
assumption, E — ((z4, x2) UE;) is total, it has a finite subset E? whose span 
contains z2. Keep iterating the procedure. That is, at the next step, let x3 
be a vector in 


E- (21,22) UE; U E2), 


note that that relative complement is total, and that, therefore, it has a 
finite subset E3 whose span contains z3. The result of this iterative proce- 
dure is an infinite set F = {z1, £2, z3, ...} with the property that E — F is 
total. Reason: E; is a subset of E — F for each j, and therefore z; belongs 
to the span of E — F for each j. 
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Solution 48. 


Assertion: if {z,,...,2} is a relatively independent subset of R”, where 
k 2 n, then there exists a vector x,4, such that (z1,... 24,2441) is rela- 
tively independent. 

For the proof, form all subsets of n — 1 vectors of {21,...,2,}, and, for 
each such subset, form the subspace they span. (Note that the dimension of 
each of those subspaces is exactly n — 1, not less. The reason is the assumed 
relative independence. This fact is not needed in the proof, but it's good 
to know anyway.) The construction results in a finite number of subspaces 
that, between them, certainly do not exhaust R”; choose z,41 to be any 
vector that does not belong to any of them. (The property of the field R 
that this argument depends on is that R is infinite.) 

Why is the enlarged set relatively independent? To see that, suppose 
that y1,...,9»-1 are any n — 1 distinct vectors of the set (24,..., £x}. In 
a non-trivial dependence relation connecting the y’s and z;,, that is, in a 
relation of the form 


Y Biyi + oua =0, 


1 


the coefficient o cannot be 0 (for otherwise the y's would be dependent). 
Any such non-trivial dependence would, therefore, imply that 2,41 be- 
longs to the span of the y's, which contradicts the way that z,.,.1 was cho- 
sen. This completes the proof of the assertion. 

Inductive iteration of the assertion (starting with an independent set of 
n vectors) yields a relatively independent set {z1 , £2, z5, . . .) with infinitely 
many elements. 

A student familiar with cardinal numbers might still be unsatisfied. 
The argument proves, to be sure, that there is no finite upper bound to 
the possible sizes of relatively independent sets, but it doesn't completely 
answer the original question. Could it be, one can go on to ask, that there 
exist relatively independent sets with uncountably many elements? The an- 
swer is yes, but its proof seems to demand transfinite techniques (such as 
Zorn's lemma). 


Solution 49. 


Let q be the number of elements in the coefficient field F and let n be the 
dimension of the given vector space over F. Since a basis of F" is a set of 
exactly n independent n-tuples of elements of F, the question is (or might 
as well be): how many independent sets of exactly n vectors in F” are there? 
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Any non-zero n-tuple can be the first element of a basis; pick one, and 
call it zı. Since the number of vectors in F” is q”, and since only the zero 
vector is to be avoided, the number of possible choices at this stage is g” — 1. 
Any n-tuple that is not a scalar multiple of z; can follow z, as the second 
element of a basis; pick one and call it z2. Since the number of vectors in 
F” is q”, and since only the scalar multiples of x; are to be avoided, the 
number of possible choices at this stage is q” — q. (Note that the number of 
scalar multiples of z, is the same as the number of scalars, and that is q.) 
The next step in this inductive process is typical of the most general step. 
Any n-tuple that is not a linear combination of zı and z2 can follow x, 
and z» as the third element of a basis; pick one and call it z3. Since the 
number of vectors in F” is q^, and since only the linear combinations of 
x, and zz are to be avoided, the number of possible choices at this stage is 
q” — q?. (The number of linear combinations of two independent vectors 
is the number of the set of all pairs of scalars, and that is q?.) Keep going 
the same way a total of n times altogether; the final answer is the product 


(q^ — 1)(q" — q)(a" — q?) --- (q^ — q^!) 


of the partial answers obtained along the way. 

Caution: this product is not the number of bases, but the number of 
ordered bases, the ones in which a basis obtained by permuting the vec- 
tors of one already at hand is considered different from the original one. 
(Emphasis: the permutations here referred to are not permutations of co- 
ordinates in an n-tuple, but permutations of the vectors in a basis.) To get 
the number of honest (unordered) bases, divide the answer by n!. 

A curious subtlety arises in this kind of counting. If F = Zz, and the 
formula just derived is applied to F? (that is, q = 2 and n = 3), it yields 


(8— 1)(8 — 2)(8 — 4) 


ordered bases, and, therefore, 28 unordered ones. Related question: how 
many bases for R? are there in which each vector (ordered triple of real 
numbers) is permitted to have the coordinates 0 and 1 only? A not too 
laborious count yields the answer 29. What accounts for the difference? 
Answer: the set {(0, 1, 1), (1,0, 1), (1, 1, 0)} is a basis for R?, but the same 
symbols interpreted modulo 2 describe a subset of F? that is not a basis. 
(Why not?) 


Solution 50. 


The wording of the question suggests that the direct sum of two finite- 
dimensional vector spaces is finite-dimensional. That is true, and the best 
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way to prove it is to use bases of the given vector spaces to construct a basis 
of their direct sum. 

If (21,..., £n} and (yi,..., ym) are bases for U and V respectively, 
then it seems natural to look at the set B of vectors 


(21,0), ..., (24,0), (0, y1),---, (0, Ym), 


and try to prove that it is a basis for U c V. 

The easiest thing to see is that B spans U @ V. Indeed, since every 
vector z in U isa linear combination of the z;'s, it follows that every vector 
of the form (z, 0) in Uc V isa linear combination of the (zi, 0)’s. Similarly, 
every vector of the form (0, y) is a linear combination of the (0, y;)'s, and 
those two conclusions together imply that every vector (x, y) in U @ V isa 
linear combination of the vectors in B. 

Is it possible that the set B is dependent? If 


a (21,0) ps + as (25,0) E Bı(0, y1) dep uen Bm (0, Ym) = (0,0), 
then 


(x OTi, EAs = (0,0), 


and it follows from the independence of the z;'s and of the y;'s that 
01 —:::— Qn = jj = ++- = Bm = 0, and the proof is complete. 


Solution 51. 


(a) Let the role of V be played by the vector space P of all real polynomials, 
and let M be the subspace of all even polynomials (see Problem 25). When 
are two polynomials equal (congruent) modulo M? Answer: when their 
difference is even. When, in particular, is a polynomial equal to 0 modulo 
M? Answer: when it is even. Consequence: if p,(z) = z?^*!, for n = 
0, 1,2, ..., then a non-trivial linear combination of a finite set of these p,,.’s 
can never be 0 modulo M. Reason: in any linear combination of them, let k 
be the largest index for which the coefficient of p, is not 0, and note that in 
that case the degree of the linear combination will be 2k + 1 (which is not 
even). Conclusion: the quotient space V/M has an infinite independent 
subset, which implies, of course, that it is not finite-dimensional. 

(b) If, on the other hand, N is the subspace of all polynomials p for 
which p(0) = 0 (the constant term is 0), then the equality of two poly- 
nomials modulo N simply means that they have the same constant term. 
Consequence: every polynomial is congruent modulo N to a scalar multi- 
ple of the constant polynomial 1, which implies that the dimension of V/M 
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is 1. If bigger examples are wanted, just make N smaller. To be specific, let 
N be the set of all those polynomials in which not only the constant term 
is required to be 0, but the coefficients of the powers z, z?, and z? are re- 
quired to be 0 also. Consequence: every polynomial is congruent modulo 
N to a polynomial of degree 3 at most, which implies that the dimension 
of V/M is 4. 


Solution 52. i 

If M is an m-dimensional subspace of an n-dimensional vector space V, 
then V/M has dimension n — m. Only one small idea is needed to be- 
gin the proof—after that everything becomes mechanical. The assumption 
that dim V = n means that a basis of V has n elements; the small idea is to 
use a special kind of basis, the kind that begins as a basis of M. To say that 
more precisely, let (7;,...,2,,) be a basis for M, and extend it, by adjoin- 
ing suitable vectors 2m41,...,2n, SO as to make it a basis of V. From now 
on no more thinking is necessary; the natural thing to try to do is to prove 
that the cosets 


Tm4i1dT M,...,24, +M 


form a basis for V/M. 

Do they span V/M? That is: if z € V, is the coset z + M necessarily a 
linear combination of them? The answer is yes, and the reason is that z is 
a linear combination of z,,...,24, so that 


b 
r= » QiTi 
i=1 
for suitable coefficients. Since 
m 
2 ajz; 
j=1 
is congruent to 0 modulo M, it follows that 


z+M= Y a(z: + M) 
i»m 
and that's exactly what's wanted. 

Are the cosets 2m+41 + M,..., £n +M independent? Yes, and the rea- 
son is that the vectors z,,,1,..., £n are independent modulo M. Indeed, 
if a linear combination of these vectors turned out to be equal to a vector, 
say z, in M, then z would be a linear combination of 71,...,2m, and the 
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only possible linear combination it could be is the trivial one (because the 
totality of the z's is independent). 

The proof is complete, and it proved more than was promised: it con- 
cretely exhibited a basis of n — m elements for V/M . 


Solution 53. 


The answer is easy to guess, easy to understand, and easy to prove, but it is 
such a frequently occurring part of mathematics that it's well worth a few 
extra minutes of attention. The reason it is easy to guess is that span and 
dimension behave (sometimes, partially) the same way as union and count- 
ing. The number of elements in the union of two finite sets is not the sum 
of their separate numbers—not unless the sets are disjoint. If they are not 
disjoint, then adding the numbers counts twice each element that belongs 
to both sets—the sum of the numbers of the separate sets is the number of 
elements in the union plus the number of elements in the intersection. The 
same sort of thing is true for spans and dimensions; the correct version of 
the formula in that case is 


dim(M + N) + dim(M N N) = dim M + dim N. 


The result is sometimes known as the modular equation. 
To prove it, write dim(M N N) = k, and choose a basis 


(a; et) Zk} 
for MN N. Since a basis for a subspace can always be extended to a basis 
for any larger space, there exist vectors z4,. .., £m such that the set 
[Dtrosy£m y) 


is a basis for M; in this notation 
dim M = m +k. 
Similarly, there exist vectors yi, . .. , y» such that the set 
(yv -Yn Zr 2k) 
is a basis for N; in this notation 
dim N =n +k. 


The span of the z's is disjoint from N (for otherwise the z's and z's together 
couldn't be independent), and, similarly, the span of the y'sis disjoint from 
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M. It follows that the set 


[z1,..., Em Yl- -+ Yn Zr +++ Zk} 


is a basis for M + N. The desired equation, therefore, takes the form 


(m+n+k)+k=(m+k)+(n+k), 


which is obviously true. (Note how the intersection M N N is “counted 
twice" on both sides of the equation.) 

That's all there is to it, but the proof has a notational blemish that is 
frequent in mathematical exposition. It is quite possible that some of the 
dimensions under consideration in the proof are 0; the case 


dim(MN N) — 0, 


for instance, is of special interest. In that special case the notation is in- 
appropriate: the suffix on zı suggests that M N N has a non-empty basis, 
which is false. It is not difficult to cook up a defensible notational system 
in such situations, but usually it's not worth the trouble; it is easier (and 
no less rigorous) just to remember that in case something is 0 a part of the 
argument goes away. 


Chapter 4. Transformations 


Solution 54. 
(a) The definitions (1) and (3) yield linear transformations; the definition 
(2) does not. The verification of linearity in (1) is boring but easy; just re- 
place (x, y) by an arbitrary linear combination 

o1(&1, m) + a2(£2,72), 


apply T, and compare the result with the result of doing things in the other 
order. Here it is, for the record. Do NOT read it till after trying to write it 
down independently, and, preferably, do not ever read it. 

First: 
T(o(&, m) + o2(£2,72)) 

= T(ai& + a2€2,a1m + 02112) 


= (o(o1&i +a2é2)+B(orm +azn2), (0161 -0262) -6(a1 m +a2N2)). 
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Second: 


a T(é1,) RE aT (€2, 72) 
= ay (o£1 + Bm, v1 + 6m) + a2(a£2 + Ba, yE2 + 6n2). 


Third and last: compare the second lines of these equations. 

As for (2): its linearity was already destroyed by the squaring coun- 
terexample in the discussion before the statement of the problem. Check 
it. 

The example (3) is the same as (1); the only difference is in the names 
of the fixed scalars. 

(b) As before, the definitions (1) and (3) yield linear transformations 
and the definition (2) does not. To discuss (1), look at any typical polyno- 
mial, such as, say 


9a? — 3z? + 22 — 5, 
and do what (1) says to do, namely, replace z by x”. The result is 
929 — 3z* + 227 — 5. 


Then think of doing this to two polynomials, that is, to two elements of P, 
and forming the sum of the results. Is the outcome the same as if the addi- 
tion had been performed first and only then was z replaced by z?? Do this 
quite generally: think of two arbitrary polynomials, think of adding them 
and then replacing x by z?, and compare the result with what would have 
happened if you had replaced x by z? first and added afterward. It's not 
difficult to design suitable notation to write this down in complete gener- 
ality, but thinking about it without notation is more enlightening—and the 
answer is yes. Yes, the results are the same. That's a statement about addi- 
tion, which is a rather special linear combination, but the scalars that enter 
into linear combinations have no effect on the good outcome. 

The definition (2) is the bad kind of squaring once more. Counterex- 
ample: consider the polynomial (vector) p(z) — x and the scalar 2, and 
compare T (2p(z)) with 2Tp(z). The first is (2p(z)) , which is 4z?, and 
the second is 27?. Question: what happens if p(x) is replaced by the even 
simpler polynomial p(z) — 1—is that a counterexample also? 

The discussion of (3) can be carried out pretty much the same way 
as the discussion of (1): instead of talking about linear combinations and 
replacing z by z?, talk about linear combinations and multiply them by z?. 
It doesn't make any difference which is done first—the formula (6) does 
indeed define a linear transformation. 
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Solution 55. 


(1) If F is a linear functional defined on a vector space V, then either F(v) 
is 0 for every vector v in V, or it is not. (The possibility is a realistic one: 
the equation F(v) = 0 does indeed define a linear functional on every 
vector space.) If F(v) — 0 for all v, then ran F just consists of the vector 
0 (in R!), and nothing else has to be said. If that is not the case, then the 
range of F contains some vector zo in R! (a real number) different from 
0. To say that xo is in the range means that V contains some vector vo such 
that F(vo) = zo. Since F is a linear functional (linear transformation), it 
follows in particular that 


F(zvo) = zz 


for every real number z. As x ranges over all real numbers, so does the 
product zzo. Conclusion: the range of F is all of R!. 

(2) The replacement of x by x + 2 is a change of variables similar to 
(but simpler than) the replacement of z by z? considered in Problem 54 
(1 (b)), and the proof that it is a linear transformation is similar to (but 
simpler than) what it was there. Squaring the variable can cause trouble 
because it usually raises the degree of the polynomial to which it is done 
(usually?—does it ever not do so?); the present simple change of variables 
does not encounter even that difficulty. 

(3) The range of this transformation contains only one vector, namely 
(0, 0); it is indeed a linear transformation. 

(4) The equation does not define a linear transformation. Counterex- 
amples are not only easy to find—they are hard to miss. For a special one, 
consider the vector (0, 0,0) and the scalar 2. Is it true that 


T(2- (0,0,0)) = 2- T(0,0,0)? 


The left side of the equation is equal to T'(0, 0, 0), which is (2, 2); the right 
side, on the other hand, is equal to 2 - (2, 2), which is (4, 4). 

(5) The “weird” vector space, call it W for the time being, is really the 
easy vector space R! in disguise; they differ in notation only. That state- 
ment is worth examining in detail. 

Suppose that two people, call them P and Q, play a notation game. 
Player P is thinking of the vector space R!, but as he plays the game he 
never says anything about the vectors that are in his thoughts—he writes 
everything. His first notational whimsy is to enclose every vectorial symbol 
in a box; instead of writing a vector x (in the present case a real number), 
he writes [£], and instead of writing something like 24-3 = 5 or something 
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like 2- 3 = 6, he writes 


-[] oœ 208 = 6 


(Note: “2” in the last equation is a scalar, not a vector; that’s why its symbol 
is not, should not be, in a box.) Player Q wouldn’t be seriously mystified by 
such a thin disguise. 

Suppose next that the notational change is a stranger one—the oper- 
ational symbols + and - continue to appear in boxes, but the symbols for 
vectors appear as exponents with the base 2. (Caution: vectors, not scalars.) 
In that case every time P thinks of a vector x what he writes is the number 
s obtained by using z as an exponent on the base 2. Example: P thinks 1 
and writes 2; P thinks 0 and writes 1; P thinks 2 and writes 4; P thinks I and 
writes 4/2; P thinks —3 and writes i. What will Q ever see? Since s is posi- 
tive no matter what z is (that is, 2” is positive no matter what real number 
z is), all the numbers that Q will ever see are positive. As x ranges over all 
possible real numbers, the exponential s (that is, 27) ranges over all possi- 
ble positive real numbers. When P adds two real numbers (vectors), x and 
y say, what he reports to Q is s |+} t, where s = 2* and t = 2". Example: 
when P adds 1 and 2 and gets 3, the report that Q sees is 212-1 4 = 8. As 
far as Q is concerned the numbers he is looking at were multiplied. 

Scalar multiplication causes a slight additional notational headache. 
Both P and Q are thinking about a real vector space, which means that 
both are thinking about vectors, but P's vectors are numbers in R! and 
Q's vectors are numbers in R4. Scalars, however, are the same for both, 
just plain real numbers. When P thinks of multiplying a real number z (a 
vector) by a real number y (a scalar), the traditional symbol for what he 
gets is yz, but what he writes is 


y[]s- t, 


where s = 2” and t = 2¥*. Notice that 27" = (27)", or, in other words, 
t = s". Example: when P is thinking (in traditional notation) about 3-2 = 6, 
what Q sees is 3[-] 4 = 64, which he interprets to mean that the scalar 
multiple of 4 by 3 has to be obtained by raising 4 to the power 3. 

That's it—the argument shows (doesn’t it?) that R! and R, differ in 
notation only. Yes, R+ is indeed a vector space. If T is defined on R} by 
T(s) = log, s (note: log to the base 2), then T in effect decodes the no- 
tation that P encoded. When Q applies T to a vector s in R4, and gets 
log; s, he recaptures the notation that P disguised. Thus, in particular, 
when s = 2* and t = 2" and T is applied to s[4-]t, the result is log; (27 -2¥), 
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which is z + y. In other words, 


T(s[*]t) = Ts 4 Tt, 


which is a part of the definition of a linear transformation. The other part, 
the one about scalar multiples goes the same way: if s = 2* and t = 2*7, 
then 


T(y[-15s) ^ T(t) = yz = yT(s). 


There is nothing especially magical about log.; logarithms to other 
bases could have been used just as well. Just remember that log, s, for 
instance, is just a constant multiple of log; s—in fact 


logio s = (logo 2) - log; s 


for every positive real number s. If T had been defined by 
T(s) = log; s, 


the result would have been the same; the constant factor log; s just goes 
along for the ride. 


Solution 56. 


(1) What do you know about a function if you know that its indefinite inte- 
gral is identically 0? Answer: the function must have been 0 to start with. 
Conclusion: the kernel of the integration transformation is {0}. 

(2) What do you know about a function if you know that its derivative is 
identically 0? Answer: the function must be a constant. Conclusion: ker D 
is the set of all constant polynomials. 


(3) How can it happen that 
2r + 3y =0 
and 
7x —57 =0? 


To find out, eliminate z. Since 


7-22+7-3y=0 
and 


2-72 —2-5y =0, 
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therefore 


21y — 10y = 0, 


or y = 0, and from that, in turn, it follows that 


2z + 3y = 22 4+ 3-0 = 22 = 0, 


and hence that z = 0. Conclusion: ker T = {(0,0)}. 
(4) How can it happen for a polynomial p that p(x?) = 0? Recall, for 
instance, that if 


p(z) = 92? — 32? + 22 — 5, 
then 
p(x”) = 929 — 3z* + 22? — 5; 


the only way that can be 0 is by having all its coefficients equal to 0, which 
happens only when all the coefficients of p were 0 to begin with. (See Prob- 
lem 54 (2 (a)).) Conclusion: the kernel of this change of variables is (0). 

(5) To say that T(x, y) = (0,0) is the same as saying that (2,0) = (0,0), 
and that is the same as saying that z = 0. In other words, if (x, y) is in the 
kernel of T, then (x, y) = (0, y). Conclusion: ker T is the y-axis. 

(6) This is an old friend. The question is this: for which vectors (x, y) 
in R? is it true that z + 2y = 0? Answer: the ones for which it is true, and 
nothing much more intelligent can be said about them, except that the set 
was encountered before and given the name R2. (See Problem 22.) 


Solution 57. 


(1) The answer is yes: the stretching transformation, which is just scalar 
multiplication by 7, commutes with every linear transformation. The com- 
putation is simple: if v is an arbitrary vector, then 


(ST)v = S(Tv) by the definition of composition 
= T(Tv) bythe definition of S 
and 
(TS)v — T(Sv) by the definition of composition 
— T(Tv) bythe definition of S 
= T(Tv) bythe linearity of T. 


The number 7 has, of course, nothing to do with all this: the same 
conclusion is true for every scalar transformation. (For every scalar y the 
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linear transformation S defined for every vector v by Su = w is itself 
called a scalar. Words are stretched by this usage, but in a harmless way, 
and breath is saved.) The proof is often compressed into one line (slightly 
artificially) as follows: 


(ST)v = S(Tv) = c(Tv) = T(cv) = T(Sv) = (TS)v. 


(2) The question doesn’t make sense; S: R? — R (that is, S is a trans- 
formation from R? to R?) and T: R? — R?, so that T'S can be formed, but 
ST cannot. 

(3) If p(x) = z, then ST'p(z) (a logical fussbudget would write 
((ST)p) (a), but the fuss doesn’t really accomplish anything) = STz = 
z?.z = 2? and T8z = Tz? = x . z? = a *—Aand that's enough to prove 
that S and T do not commute. 

A student inexperienced with thinking about the minimal, barebones, 
extreme cases that are usually considered mathematically the most elegant 
might prefer to examine a more complicated polynomial (not just z, but, 
say, 1 + 2x + 32?). For the brave student, however, there is an even more 
extreme case to look at (more extreme than z): the polynomial p(x) = 1. 
The action of T on 1 is obvious: T1 = z?. What is the action of 5 on 
1? Answer: the result of replacing the variable x by x? throughout—and 
since z does not explicitly appear in 1, the consequence is that $1 — 1. 
Consequence: ST1 = Sz? = zt and TS1 = T1 = z?. Conclusion: (as 
before) S and T' do not commute. 

To say that S and T' do not commute means, of course, that the com- 
positions ST' and T'S are not the same linear transformation, and that, in 
turn, means that they disagree at at least one vector. It might happen that 
they agree at many vectors, but just one disagreement ruins commutativ- 
ity. Do the present ST' and T'S agree anywhere? Sure: they agree at the 
vector 0. Anywhere else? That's a nice question, and it's worth a moment's 
thought here. Do ST and TS agree at any polynomial other than 0? Since 


STp(z) = S(z?p(z)) = z*p(z?) 
and 
TSp(z) = Tp(z?) = a?^p(z?), 


the question reduces to this: if p # 0, can z*p(z?) and z?p(z?) ever be 
the same polynomial? The answer is obviously no: if that equation held 
for p # 0, it would follow that z? = z*, which is ridiculous. (Careful: 
x? = z* is not an equation to be solved for an unknown z. It offers itself 
as an equation, an identity, between two polynomials, and that's what's 


ridiculous.) 
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(4) Since S: R? — R! and T:R! — R?, both products ST and TS 
make sense, and, in fact 


ST:R! — R! and TS: R?  R?. 


It may be fun to calculate what ST and TS are, but for present purposes it 
is totally unnecessary. The point is that ST is a linear transformation on R! 
and T'S is a linear transformation on R?; the two have different domains 
and it doesn't make sense to ask whether they are equal. No, that's not 
correctly said: it makes sense to ask, but the answer is simply no. 

(5) To decide whether ST'p(z) = T'Sp(x) for all p, look at a special 
case, and, in particular, look at an extreme one such as p(x) = 1, and 
hope that it solves the problem. Since $1 — 1 and T1 — 1, it follows 
that ST1 — T'S1 — 1. Too bad—that doesn't settle anything. What about 
p(x) = x? Since STz = S0 = 0 and TSz = T(x + 2) = 2, that does settle 
something: S and T do not commute. 

(6)-(1) The scalar 7 doesn't affect domain, range, or kernel: the ques- 
tion is simply about dom T, ran T, and ker T. Answer: 


domT = ran T = R?, and ker T — (0). 


(6)-(2) Since T'S(z,y,z) = T(7z,7y, 7z) = (Tz, Ty), it follows easily 
that dom TS = R3, ran TS = R?, and ker TS is the set of all those vectors 
(x, y, z) in R?, for which z = y = 0, that is, the z-axis. (Look at the whole 
question geometrically.) Since there is no such thing as ST, the part of the 
question referring to it doesn't make sense. 

(6)-(3) The domains are easy: dom ST = dom T'S = P. The kernels 
are easy too: since 


STp(z) = Sz?p(z) = z*p(z?) 
and 
TSp(z) = Tp(z?) = 2?p(z?), 


it follows that ker ST = ker Ts = {0}. The question about ranges takes a 

minute of thought. It amounts to this: which polynomials are of the form 

z?p(z?), and which are of the form z*p(z?)? Answer: ran TS is the set of 

all even polynomials with 0 constant term, and ran ST is the set of all those 

even polynomials in which, in addition, the coefficient of x? is 0 also. 
(6)-(4) Now is the time to calculate the products: 


STz = S(x,£) = £ + 2r +3 
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and 
TS(z, y) = T(x + 2y) = (£ + 2y, £ + 2y). 


Answers: dom ST = R!, dom TS = R?, ranST = R!, ranTS is the 
“diagonal” consisting of all vectors (x, y) in R? with z = y, ker ST = {0}, 
and ker T'S is the line with the equation z + 2y = 0. 

(6)-(5) To find the answer to (5) the only calculations needed were for 
STzx and T'Sz. To get more detailed information, more has to be calcu- 
lated, as follows: 


ST (o + Bx + yz? + 62?) = S(a + yz?) 
— a4 (x +2)? = (a +27) + 4yr + Aya? 
and 
T S(o + Br -- yz? + 62?) =T(a+ B(x 4-2) + y(z + 2)? + d(x + 2)) 
= (a + 28 + 4y + 86) + (y + 66) x? 


There is no trouble with domains (both are P3). The range of ST is the set 
of all those quadratic polynomials for which the coefficients of x and z? 
are equal, and the range of T'S is the set of all those quadratic polynomials 
for which the coefficient of x is 0. The kernel of ST is the set of all those 
cubic polynomials, that is polynomials of the form 


a+ Br + ya? + 62%, 
for which a = y = 0, and the kernel of TS is the set of all those whose 


coefficients satisfy the more complicated equations 


a 4- 28 4- 4» - 86 2 y 4-66 — 0. 


Solution 58. 


Yes, ran A C ran B implies the existence of a linear transformation T' such 
that A — BT. The corresponding necessary condition for right divisibility, 
A= SB, is 


ker B C ker A, 


and it too is sufficient. 

The problem is, given a vector x in the vector space V, to define Tz, 
and, moreover, to do it so that Az turns out to be equal to BT x. Put y = 
Az, so that y € ran A; the assumed condition then implies that y € ran B. 
That means that y = Bz for some z, and the temptation is to define Tz 
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to be z. That might not work. The difficulty is one of ambiguity: z is not 
uniquely determined by y. It could well happen that y is equal to both Bz, 
and Bz); should Tz be z or 22? 

If Bz; = Bzz, then B(z, — 22) = 0, which says that 


zı — 22 € ker B. 


The way to avoid the difficulty is to stay far away from ker B, and the way to 
do that is to concentrate, at least temporarily, on a complement of ker B. 
Very well: let M be such a complement, so that 


MnkerB={0} and M+kerB=V. 


Since B maps ker B to {0}, the image of M under B is equal to the 
entire range of B, and since M has only 0 in common with ker B, the map- 
ping B restricted to M is one-to-one. It follows that for each vector x there 
exists a vector z in M such that Az = Bz, and, moreover, there is only one 
such z; it is now safe to yield to temptation and define T'z to be z. The con- 
ceptual difficulties are over; the rest consists of a routine verification that 
the transformation T' so defined is indeed linear (and, even more trivially, 
that A = BT). 

As for right divisibility, A = SB, the implication from there to ker B C 
ker A is obvious; all that remains is to prove the converse. A little experi- 
mentation with the ideas of the preceding proof will reveal that the right 
thing to consider this time is a complement N of ran B. For any vector z 
in ran B, that is, for any vector of the form By, define Sx to be Ay. Does 
that make sense? Couldn’t it happen that one and the same z is equal to 
both By; and By», so that Sz is defined ambiguously to be either Ay, or 
Ay»? Yes, it could, but no ambiguity would result. The reason is that if 
By, = Bye, so that y — y2 € ker B, then the assumed condition implies 
that yy — yı € ker A, and hence that Ay; = Ayı. Once S is defined on 
ran B, it is easy to extend it to all of V just by setting it equal to 0 on N. The 
rest consists of a routine verification that the transformation S so defined 
is indeed linear (and, even more trivially, that A = SB). 


Solution 59. 


The questions have interesting and useful answers in the finite-dimensional 
case; it is, therefore, safe and wise to assume that the underlying vector 
space is finite-dimensional. 

(1) If the result of applying a linear transformation A to each vector 
in a total set is known, then the entire linear transformation is known. It 
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is instructive to examine that statement in a simple special case; suppose 
that the underlying vector space is R?. If 


A(, 0) = (a, y) 
and 
A(0, 1) = (8,8) 


(there is a reason for writing the letters in a slightly non-alphabetic order 
here), then 


A(z, y) = x(a, y) $ v(8, 6) =a (ax + By, yx + dy) 


(and the alphabet has straightened itself out). 

The reasoning works backwards too. Given A, corresponding scalars 
a, B, y, 6 can be found (uniquely); given scalars a, 8, y, 6, a corresponding 
linear transformation A can be found (uniquely). 

The space R? plays no special role in this examination; every 2-dimen- 
sional space behaves the same way. And the number 2 plays no special 
role here; any finite-dimensional space behaves the same way. The only 
difference between the low and the high dimensions is that in the latter 
more indices (and therefore more summations) have to be juggled. Here 
is how the juggling looks. 

Given: a linear transformation A on a vector space V with a prescribed 
total set, and an arbitrary vector x in V. Procedure: express x as a linear 
combination of the vectors in the total set, and deduce that the result of 
applying A to x is the same linear combination of the results of applying 
A to the vectors of the total set. If, in particular, V is finite-dimensional, 
with basis {e1, €2, .. . , e), then a linear transformation A is uniquely de- 
termined by specifying Ae; for each j. The image Ae; is, of course, a linear 
combination of the e;'s, and, of course, the coefficient of e; in its expansion 
depends on both i and j. Consequence: Ae; has the form 5^7 , ai;e:. In 
reverse: given an array of scalars o;; (1 = 1,...,n; j = 1,...,n), a unique 
linear transformation A is defined by specifying that 


n 
Ae; = 1 Qijêi 
i=l 


for each j. Indeed, if 


n 
Tt = 1 MZIZE 
j=l 
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then 


"n n n "n n 
Az = X Ae; = Sou M oun = > Y ayy €j. 
j=1 j=l i=l i=1 \j=1 

The conclusion is that there is a natural one-to-one correspondence 
between linear transformations A on a vector space of dimension n and 
square arrays (matrices) (o4j) (1 = 1,...,n; j = 1,...,n). Important 
comment: linear combinations of linear transformations correspond to the 
same linear combinations of arrays. If, that is, 


n n 
Ae; = o oue: and Be; = 3 Bijes, 
i=l i=l 
then 


(aA + 8B)e; = 5 "(aos + BBis)es- 
t=1 


Each {a;;} has n? entries; except for the double subscripts (which are 
hardly more than a matter of handwriting) the œ;;’s are the coordinates of 
a vector in R"'. Conclusion: the vector space L(V) is finite-dimensional; 
its dimension is n?. 

(2) Consider the linear transformations 1, A, A2,..., A”. They con- 
stitute n? + 1 elements of the vector space L(V) of dimension n?, and, 
consequently, they must be linearly dependent. The assertion of linear de- 
pendence is the assertion of the existence of scalars œo, a1,...,Qn2 such 
that 

ao tayAt-+++a_2A™ = 0, 


and that, in turn, is the assertion of the existence of a polynomial 


2 
Qo +AT +’ Hanar” 


such that p(A) = 0. Conclusion: yes, there always exists a non-zero poly- 
nomial p such that p(A) = 0. 
(3) If A is defined by Az = yo(z)zo, then 
A?z = A[Az] = yo(z) Azo = yo(z) [yo(x0) xo] = yo(zo) Az. 
In other words: A*z is a scalar multiple (by the scalar yo(zo)) of Az, or, 
simpler said, A? is a scalar multiple (by the scalar yo(xo)) of A. Differently 
expressed, the conclusion is that if p is the polynomial (of degree 2) defined 
by 
p(t) = t? — yo(zo)t, 
then p( A) — 0; the answer to the question is 2. 
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Solution 60. 


Suppose that T is a linear transformation with inverse T—* on a vector 
space V. If v; and vz are in V with Tu; = v; and Tuz = ve, then 
Ttv +2) = T! (Tui + Tux) 
— T-l(T(u +u2)) (because T is linear) 
=u +u2 (by the definition of T^!) 
—T-lw- Tv (by the definition of T1), 


and, similarly, if v is an arbitrary vector in V, o is an arbitrary scalar, and 
v — Tu, then 


T-!(ov) = T! (o(Tu)) = T! (T(ou)) = ou = o(T-1v) 


—q.e.d. 
Solution 61. 
(1) What is the kernel of T? That is: for which (£) does it happen that 


CHALIU 
26 4n 0/. 
Exactly those for which 2€ +7 = 0, or, in other words, n = —2£, and that’s a 
lot of them. The transformation T' has a non-trivial kernel, and, therefore, 


it is not invertible. 
(2) The kernel question can be raised again, and yields the answer 


€ 


that both € and 7 must be 0; in other words the only E in the kernel 


is m That suggests very strongly that T' is invertible, but a really sat- 


isfying answer to the question is obtained by forming 77. Since all that T 
does is interchange the two coordinates of whatever vector it is working 
on, 7? interchanges them twice—which means that T? leaves them alone. 
Consequence: T? = 1, or, in other words, T-! = T. 

(3) The differentiation transformation D on P; is not invertible. Rea- 
son (as twice before in this problem): D has a non-trivial kernel. That is: 
there exist polynomials p different from 0 for which Dp — 0—namely, all 
constant polynomials (except 0). 
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Solution 62. 
Both assertions are false. 
For (1), take (o) and (8) to be invertible, and put (y) = (8)-!, 
(6) = (a)^!. In that case 
_f{ (@) (68) ) 
T (o (a)! )? 


which makes it obvious that all four formal determinants are equal to the 
matrix 0. If, in particular, 


then 
2 1 0 A 1 1 
so that 
1 0 1 1 
1 1 0 1 
M 1-1 1 O0 
0 1 -1 1 


The point is that M is invertible. Such a statement is never obvious— 
something must be proved. The simplest proof is concretely to exhibit the 
inverse, but the calculation of matrix inverses is seldom pure joy. Be that 
as it may, here it is; in the present case 


-1 


For (2), take (o) involutory ((a)? = 1) and (8) nilpotent of index 2 
((8)? = 0), and put (y) = (8), (6) = (o). In that case 


m= (to) 


which makes it obvious that all four formal determinants are equal to the 
identity matrix 1. If, in particular, 


w= 6) md e- (1 0) 
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then 


1 
0 
0 


3 
Il 
= Or © 


0 
0 
1 
0 


fe Or © 


0 


Since the first and third columns of M are equal, so that M sends the first 
and third natural basis vectors to the same vector, the matrix M is not 
invertible. 


Solution 63. 


The problem of evaluating det M, calls attention to a frequently usable 
observation, namely that the determinant of a direct sum of matrices is the 
product of their determinants. (The concept of direct sums of matrices has 
not been defined—is its definition guessable from the present context?) 


Since 
1 2 3 4 
act (3 a =-3 and act (1 A --—T, 


it follows that det M, — 21. 

If a matrix has two equal columns (or two equal rows?), then it is not 
invertible, and, therefore, its determinant must be 0. The matrix M» has 
two equal rows (for instance, the first and the fifth, and also the second 
and the fourth) and therefore det M» = 0. 


The simplest trick for evaluating det M3 is to observe that M3 is similar 
2 


2 3 
similarity of matrices has not been defined yet—is its definition guessable 
from the present context?) The similarity is achieved by a permutation ma- 
trix. What that means, in simple language, is that if the rows and columns 
of M3 are permuted suitably, M3 becomes such a direct sum. Since 


3 2 
det G i) =5, 


it follows that det M4 = 53 = 125. 


to the direct sum of three copies of the matrix J (The concept of 


Solution 64. 


If n = 1, then (1) is the only invertible 01-matrix and the number of its 
entries equal to 1 is 1; that's an uninteresting extreme case. When n — 2, 
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the optimal example is 


(0 1) 


with three 1’s. What happens when n = 3? 
The invertible matrix 


has six 1’s; can that be improved? There is one and only one chance. An 
extra 1 in either the second column or in the second row would ruin invert- 
ibility; what about an extra 1 in position (3, 1)? It works: the matrix 


1 1 1 
0 1 1 
10 1 


is invertible. An efficient way to prove that is to note that its determinant 
is equal to 1. 

Is the general answer becoming conjecturable? The procedure is in- 
ductive, and the general step is perfectly illustrated by the passage from 3 
to 4. Consider the 4 x 4 matrix 


1 1 
1 
1 


m O m m 
m om om 


0 
1 
1 0 1 

and expand its determinant in terms of the first column. The cofactor of 
the (1, 1) entry is invertible by the induction assumption. The (2, 1) entry is 
0, and, therefore, contributes 0 to the expansion. The cofactor of the (3, k) 
entry, for k > 2, contains two identical rows, namely the first two rows 
that consist entirely of 1’s—it follows that that cofactor contributes 0 also. 
Consequence (by induction): the matrix is invertible. 

The number of 1’s in the matrix here exhibited is obtained from n? by 
subtracting the number of entries in the diagonal just below the main one, 
and that number is n — 1. This proves that the number of 1’s can always be 
as great as n? — n +1. 

Could it be greater? If a matrix has as many as n?—n4-2 (= n?— (n—2)) 
entries equal to 1, then it has at most n —2 entries equal to 0. Consequence: 
it must have at least two rows that have no 0’s in them at all, that is at least 


two rows with nothing but 1’s in them. A matrix with two rows of 1’s cannot 
be invertible. 
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Comment. Can the desired invertibilities be proved without determinants? 
Yes, but the proof with determinants seems to be quite a bit simpler, and 
even, in some sense, less computational. 


Solution 65. 


Yes, L(V) has a basis consisting of invertible linear transformations. One 
way to construct such a basis is to start with an easy one that consists of 
non-invertible transformations and modify it. The easiest basis of L(V) is 
the set of all customary matrix units: they are the matrices E(z, j) whose 
(p,q) entry is 6(, p)6(j, q), where 6 is the Kronecker delta. (The indices i, 
j, p, q here run through the values from 1 to n.) In plain language: each 
E(i, j) has all entries except one equal to 0; the non-zero entry is a 1 in 
position (i, j). Example: if n = 4, then 


0000 
0010 
E23-7|6 9 0 0 
0000 


The n? matrices E(i, j) constitute a basis for the vector space L(V), 
but, obviously, they are not invertible. If 


F(ij) = E(4,3) +1 


(where the symbol “1” denotes the identity matrix), then the matrices 
F(t, j) are invertible—that's easy—and they span L(V)—that’s not obvi- 
ous. Since there are n? of them, the spanning statement can be proved by 
showing that the F'(i, j)’s are linearly independent. 

Suppose, therefore, that a linear combination of the F’s vanishes: 


X a(i, 3)F (3) ^ 0, 
ij 
or, in other words, 
X=} a(i,j)-14+ Y a(i, EG, j) =0. 
ij ij 


If p Z q, then the (p, q) entry of X is0 + o(p, q), and therefore a(p, q) = 0. 
What about the entries a(p, p)? The (p, p) entry of X is 


X a(i, j) + a(p, p), 


ij 
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which is therefore 0. But it is already known that o(i, 7) = 0 when i = j, 
and it follows that 


a(p, p) zu 2 alii) =0 


for each p. Consequence: the a(p, p)’s are all equal (!), and, what’s more, 
their common value is the negative of their sum. The only way that can 
happen is to have a(p, p) = 0 for all p—and that finishes the proof that the 
F's are linearly independent. 


Solution 66. 


The answer is that on a finite-dimensional vector space every injective lin- 
ear transformation is surjective, and vice versa. 

Suppose, indeed, that (u,, u»,..., Un} is a basis of a vector space V 
and that T is a linear transformation on V with kernel {0}. Look at the 
transformed vectors Tu), Tu2,..., Tun: can they be dependent? That is: 
can there exist scalars a), a2,...,@ such that 


ayTuy + o2T us 4 --- c aV Tu, = 0? 
If that happened, then (use the linearity of T) it would follow that 


T(o1ui + agua Fc QnUn) = 0, 


and hence that 


1 Uy + 02U2 - ::- +H Ant, = 0 


(here is where the assumption about the kernel of T is used). Since, how- 
ever, the set 


{u1,U2,..-,Un} 


is independent, it would follow that all the a’s are 0—in other words that 
the transformed vectors 


Twu,Tu,,..., Tu, 


are independent. An independent set of n vectors in an n-dimensional vec- 
tor space must be a basis (if not, it could be enlarged to become one, but 
then the number of elements in the enlarged basis would be different from 
n—see Problem 42). Since a basis of V spans V, it follows that every vector 
is a linear combination of the vectors T'uj, T'u2,. .., Tu, and hence that 
the range of T is equal to V. Conclusion: ker T = (0) implies ran T = V. 
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The reasoning in the other direction resembles the one just used. Sup- 


pose this time that {u, u2,..., Un} is a basis of a vector space V and that T 
is a linear transformation on V such that ran T = V. Assertion: the trans- 
formed vectors T'u,, Tu2,..., Tun span V. Reason: since, by assumption 


every vector v in V is the image under T' of some vector u, and since every 
vector u is a linear combination of the form 


Uy + 02U2 T -:: + AnUn, 
it follows indeed that 
v = Tu = T(aqu; + o2u3 9: + anun) 
= ayTuy + o3T us t: +OnTUn. 


Since a total set of n vectors in an n-dimensional vector space must be a 
basis (if not, it could be decreased to become one, but then the number 
of elements in the enlarged basis would be different from n—see Problem 
42), it follows that the transformed vectors T'u;, Tu2,..., Tu, are inde- 
pendent. If now v is a vector in ker T, then expand u in terms of the basis 
(01, U2,..-,Un}, so that 


u = AU +AU t ::* + Anun, 


infer that 


0=Tu=a,Ty + o2Tuo +--+ c o Tus, 
and hence that the a’s are all 0. Conclusion: ran T = V impliesker T = {0}. 


Comment. The differentiation operator D on the vector space Ps is nei- 
ther injective nor surjective; that’s an instance of the result of this section. 
The differentiation operator D on the vector space P is surjective (is that 
right?), but not injective. The integration operator T (see Problem 56) is 
injective but not surjective. What’s wrong? 

The answer is that nothing is wrong; the theorem is about finite-dimen- 
sional vector spaces, and P is not one of them. 


Solution 67. 


If the dimension is 2, then there are only two ways a basis (consisting of 
two elements) can be permuted: leave its elements alone or interchange 
them. The identity permutation obviously doesn’t affect the matrix at all, 
and the interchange permutation interchanges the two columns. 

It is an easy (and familiar?) observation that every permutation can be 
achieved by a sequence of interchanges of just two objects, and, in the light 
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of the comment in the preceding paragraph, the effect of each such inter- 
change is the corresponding interchange of the columns of the matrix. It 
is, however, not necessary to make use of the achievability of permutations 
by interchanges (technical word: transpositions); the conclusion is almost 
as easy to arrive at directly. If, for instance, the dimension is 3, if a basis is 
{e1, €2, e3), and if the permutation under consideration replaces that basis 
by (es, e1, e2}, then the effect of that replacement on a matrix such as 


O11 O12 13 
021 022 23 
€31 032 33 


produces the matrix 


O13 O11 Gi12 
023 G21 M22 
O33 31 32 


Solution 68. 

To say that (o; } isa diagonal matrix is the same as saying that o; = Œi; 6;; 
for all i and j (where 6;; is the Kronecker delta, equal to 1 or 0 according 
asi = j or i # j). If B = (8;;), then the (i, 7) entry of AB is 


n 
3 cin bin Beg = ou 
k=1 
(because the presence of 6;, makes every term except the one in which 
k = i equal to 0), and the (i, 7) entry of BA is 


"n 
Y 8ao156; = fija. 
k=1 
Ifi # j, then the assumption about the diagonal entries says that o; 7 a;;, 
and it follows therefore, from the commutativity assumption, that ;; must 
be 0. Conclusion: B is a diagonal matrix. 


Solution 69. 


If B commutes with every A, then in particular it commutes with every 
diagonal A with distinct diagonal entries, and it follows therefore, from 
Problem 68, that B must be diagonal—in the sequel it may be assumed, 
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with no loss of generality, that B is of the form 


B 00 0 
058 0 0 
0 0 f 0 
0.0 0 f, 


At the same time B commutes with the matrices of all those linear transfor- 
mations that leave fixed all but two entries of the basis. In matrix language 
those transformations can be described as follows: let p and q be any two 
distinct indices, and let C be obtained from the identity matrix by replacing 
the 1’s in positions p and q by 0’s and replacing the 0’s in positions (p, q) 
and (q, p) by 1’s. Typical example (with n = 4, p = 2, and q = 3): 


1000 
00 10 
n= 0100 
000 1 
Since 
6 0 0 0 
_{ 0 0 fe 0 
BCS 0 8 0 0 
0 0 0 f 
and 
mO000 
_{ 0 0 8 0 
C2, 0 8 0 0 
0 0 0 fi 


it follows that 82 = 83. It's clear (isn't it?) that the method works in general 
and proves that all the @’s are equal. 


Solution 70. 


Consider the linear transformation 


0 1 
(o o). 
or, more properly speaking, consider the linear transformation A on R? 
defined by the matrix shown. Note that if u = (o, 8) is any vector in R?, 
then Au = (8,0). Consequence: if M is an invariant subspace that contains 
a vector (o, 8) with 8 # 0, then M contains (8,0) (and therefore (1, 0)), 
and it follows (via the formation of linear combinations) that M contains 
(0, 8) (and therefore (0, 1)). In this case M — R?. 
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If M is neither O nor R?, then every vector in M must be of the form 
(a, 0), and the set M, of all those vectors do in fact constitute an invariant 
subspace. Conclusion: the only invariant subspaces are O, Mj, and R°. 


Solution 71. 


If D is the differentiation operator on the space P, of polynomials of de- 
gree less than or equal to n, and if m < n, then P, is a subspace of P,,, and 
the subspace P, is invariant under D. Does Pm have an invariant comple- 
ment in P,,? 

The answer is no. Indeed, if p is a polynomial in P, that is not in Pm, 
in other words if the degree k of p is strictly greater than m, then replace p 
by a scalar multiple so as to justify the assumption that p is monic (p(t) = 
t + ay a7. + --- + ag). If p belongs to a subspace invariant under D, 
then Dp, D?p, ... all belong to that subspace, and, therefore, so does the 
polynomial D*~™p, which is of degree m. Consequence: every polynomial 
has the property that if D is applied to it the right number of times, the 
result is in Pm. Conclusion: P,, can have no invariant complement. 


Comment. Ifn = 1, then P, (= P1) consists of all polynomials a + Gt of 
degree 1 or less, and D sends such a polynomial onto the constant poly- 
nomial 8 (= 8 + 0 - t). That is only trivially (notationally) different from 
the set of ordered pairs (o, 8) with the transformation that sends such a 
pair onto (8, 0)—in other words in that case the present solution reduces 
to Solution 69. 


Solution 72. 


A useful algebraic characterization of projections is idempotence. Expla- 
nation: to say that a linear transformation A is idempotent means that 
A? = A. (The Latin forms “idem” and “potent” mean “same” and 
“power”.) In other words, the assertion is that if E is a projection, then 
E? — E, and, conversely, if E? — E, then E is a projection. 

The idempotence of a projection is easy to prove. Suppose, indeed, 
that E is the projection on M along N. If z = z + y isa vector, with z in M 
and y in N, then Ez = z, and, since x = x + 0, so that Ez = z, it follows 
that E?z — Ez. 

Suppose now that E is an idempotent linear transformation, and let 
M and N be the range and the kernel of E respectively. Both M and N 
are subspaces; that's known. If z is in M, then, by the definition of range, 
z = Eu for some vector u, and if z is also in N, then, by the definition of 
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kernel, Ez = 0. Since E = E?, the application of E to both sides of the 
equation z = Eu implies that Ez = z; since, at the same time, Ez = 0, it 
follows that z = 0. Conclusion: MN N = O. 

If z is an arbitrary vector in V, consider the vectors 


Ez and z— Ez (=(1-E)z); 
call them z and y. The vector x is in ran E, and, since 
Ey = Ez — E?z = 0, 
the vector y is in ker E. Since z = x + y, it follows that 
M+N=V. 


The preceding two paragraphs between them say exactly that M and 
N are complementary subspaces and that the projection of any vector z 
to M along N is equal to Ez—that settles everything. Note, in particular, 
that the argument answers both questions: projections are just the idem- 
potent linear transformations, and if E is the projection on M along N, 
then ran E = M and ker E = N. 

It is sometimes pleasant to know that if E is a projection, then ran E 
consists exactly of the fixed points of E. That is: if z is in ran E, then Ez — 
z, and, trivially, if Ez = z, then z is in ran E. 


Solution 73. 
If E and F are projections such that E + F also is a projection, then 
(E+FP =E+F, 
which says, on multiplying out, that 
EF+ FE =0. 
Multiply this equation on both left and right by E and get 
EF+EFE=0 and EFE+FE=0. 
Subtract one of these equations from the other and conclude that 
EF-FE=0, 
and hence (since both the sum and the difference vanish) 
EF = FE =0. 


That's a necessary condition that E + F be a projection. 
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It is much easier to prove that the condition is sufficient also: if it is 
known that EF = FE = 0, then the cross product terms in (E + F)? 
disappear, and, in view of the idempotence of E and F separately, it follows 
that + F is idempotent. 

Conclusion: the sum of two projections is a projection if and only if 
their products are 0. (Careful: two products, one in each order.) 


Question. Can the product of two projections be 0 in one order but not 
the other? Yes, and that takes only a little thought and a little experimental 


search. If 
1 0 _fa B 
B=(; o) r= (3 ;) 


and EF = 0, then a = 8 = 0. The resulting F = $ 


if and only if either y = 6 = 0 or else 6 = 1. A pertinent example is 


0 0 
r1) 


in that case EF = 0 and FE # 0. 


0X... 
) is idempotent 


Solution 74. 


The condition is that E? — E?; a strong way for a linear transformation to 
satisfy that is to have E? — 0. Is it possible to have E? — 0 without E — 0? 
Sure; a standard easy example is 


0 1 
Hm P 0 ) i 

In that case, indeed, E? (1 — E) = 0, but E(1— E) = 0 is false. That settles 
the first question. 

It is easy to see that the answer to the second question is no—for the 
E just given it is not true that E(1— E)? = 0 (because, in fact, E(1— E)? = 
E-E’). 

That answers both questions, but it does not answer all the natural 
questions that should be asked. 

One natural question is this: if E(1 — E)? = 0, does it follow that E is 
idempotent? No—how could it? Just replace the E used above by 1 — E— 


that is, use 
1 -1 
0 1 
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ee) 
0 0 
so that (1 — E)? = 0, and therefore E(1 — E)? = 0, but it is not true that 


E? — E. 
Another natural question: if both 


as the new E. Then 


E*(1— E) -0 and |E(1— E) —0, 


does it follow that E is idempotent? Sure: add the two equations and sim- 
plify to get E — E? — 0. 


Chapter 5. Duality 


Solution 75. 


If n = 0, then € = 0, everything is trivial and the conclusion is true. In 
the remaining case, consider a vector zo such that n(zo) # 0, and reason 
backward. That is, assume for a moment that there does exist a scalar o 
such that £(z) = an(z) for all z, and that therefore, in particular, £(zo) = 
om(zo), and infer that 


a = S2) 
n(zo) 

[Note, not a surprise, but pertinent: it doesn’t matter which zo was picked 
—so long as (zo) # 0, the fraction gives the value of o. Better said: if 
there is an a, it is uniquely determined by the linear functionals € and 7.] 

Now start all over again, and go forward (under the permissible as- 
sumption that there exists a vector zo such that n(zo) 4 0). The linear 
functional 7) sends 


zo to (Zo), 
and hence it sends 
Zo 
(zo) 


and hence 


Zo 
—— to y 
n(zo) 
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for every scalar y. Special case: 7 sends 


n(z)zo 
(zo) 
for every vector z. Consequence: 


for all x. The relation between € and 7 now implies that 


(g) 


for all z, which says exactly that 


to n(z) 


_ &(zo 


z z), 
&(z) " FRU) ) 
and that is what was wanted (with o ~ §(0) 
n(zo) 


Solution 76. 


Yes, the dual of a finite-dimensional vector space V is finite-dimensional, 
and the way to prove it is to use a basis of V to construct a basis of V’. Sup- 
pose, indeed, that (71,22,..., £n } is a basis of V. Plan: find linear func- 
tionals £1, £5,.. . , én that "separate" the z's in the sense that 


&(zj) = bij 


for each i, j = 1,2,...,n. Can that be done? If it could, then the value of 
£j at a typical vector 


T = 121 0229 t -:- + O42 


would be a;—and that shows how £; should be defined when it is not yet 
known. That is: writing £í(z) = o; for each i does indeed define a linear 
functional (verification?). 

The linear functionals £1, £2, . . . , €n are linearly independent. Proof: if 


Bi&(x) + Ba€a(z) +--+ + Bnén(x) 20 


for all x, then, in particular, the linear combination vanishes when z = 2; 
(j = 1,...,n), which says exactly that 8; = 0 for each j. 

Every linear functional is a linear combination of the £;'s. Indeed, if € 
is an arbitrary linear functional and z = 042; + 322 + +++ + oz, is an 
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arbitrary vector, then 


E(x) = €(a121 + a222 +++: + o4) 
= a1 €(21) + a2€(r2) + - -` + On€(Tn) 
= &(z)&(z1) + &(z)£(z2) + +++ + &n(2)E(zn) 
= (&(1)& + £x(z2)£o +- + En(2n)En) (2). 


The preceding two paragraphs yield the conclusion that the z's consti- 
tute a basis of V' (and hence that V’ is finite-dimensional). 


Corollary. dim V' = dim V = n. 


Solution 77. 


The answer is yes: the linear transformation T defined by 


T(z) = T (21, t. «5: n) = (yi(x),-- S Yn(2)) 


is invertible. One reasonably quick way to prove that is to examine the 
kernel of T. Suppose that x = (z,,..., 4) is a vector in F” that belongs 
to the kernel of T, so that 


(m (2), .... 9m (z)) = (0,...,0). 


Since the coordinate projections p; belong to the span of yi, . . . , yn, it fol- 
lows that for each j there exist scalars o,,..., o, such that 


pj = 3 aryr- 
k 
Consequence: 
pj(z) = Y aryr =0, 
k 


for each j, which implies, of course, that z = 0; in other words the kernel 
of T is {0}. Conclusion: T is invertible. 
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Solution 78. 


The verification that T is linear is the easiest step. Indeed: if z and y are 
in V and a and £ are scalars, then 


T(oz + By)(u) = w(az + By) = au(x) + Buly) 
= a(Tz)(u) + B(Ty)(u) 
= (aTz + 8T y)(u). 


How can it happen (for a vector z in V) that Tz = 0 (in V^)? Answer: it 
happens when 


u(r)-0 


for every linear functional u on V—and that must imply that z = 0 (see the 
discussion preceding Problem 74). Consequence: T' is always a one-to-one 
mapping from V to V". 

The only question that remains to be asked and answered is whether 
or not T maps V onto V", and in the finite-dimensional case the answer is 
easily accessible. The range of T' is a subspace of V" (Problem 55); since 
T is an isomorphism from V to ran T, the dimension of ran T is equal to 
the dimension of V. The dimension of V" is equal to the dimension of 
V also (because dim V — dim V' and dim V' — dim V"). A subspace of 
dimension n in a vector space of dimension n cannot be a proper subspace. 
Consequence: ran T = V", Conclusion: the natural mapping of a finite- 
dimensional vector space to its double dual is an isomorphism, or, in other 
words, every finite-dimensional vector space is reflexive. 


Solution 79. 


Some proofs in mathematics require ingenuity, and others require nothing 
more than remembering and using the definitions—this one begins with 
a tiny inspiration and then finishes with the using-the-definitions kind of 
routine. 

Choose a basis (21, 22,..., £n} for V so that its first m elements are 
in M (and therefore form a basis for M); let (u1,u5,...,u4) be the dual 
basis in V' (see Solution 75). Since u;(z;) = 6i, it follows that the u;'s with 
i > m annihilate M and with 4 € m do not. In other words, if the span of 
the u;'s with ¿ > m is called N, then N c M°. 

If, on the other hand, u is in M?, then, just because it is in the space 
V’, the linear functional u is a linear combination of v;'s. Since any such 
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linear combination 


u = fiui + fuz +--+ + Bun 


applied to one of the ;'s with 7 € m yields 0 (because z; is in M) and, 
at the same time, yields 8; (because u;(z;) = 0 when  Z j), it follows 
that the coefficients of the early u;'s are all 0. Consequence: v is a linear 
combination of the latter u;'s, or, in other words, u is in N, or, better still 
M? c N. 

The conclusions of the preceding two paragraphs imply that M? — N, 
and hence, since N has a basis of  — m elements, that 


dim M? = n — m. 


Solution 80. 


If the spaces V and V" are identified (as suggested by Problem 77), then, 
by definition, M9? consists of the set of all those vectors z in M such that 
u(x) = 0 for all u in V’. Since, by the definition of V’, the equation u(x) = 0 
holds for all z in M and all u in M?, so that every z in M satisfies the 
condition just stated for belonging to M99, it follows that M C M9, If 
the dimension of V is n and the dimension of M is m, then (see Problem 
78) the dimension of M? is n — m, and therefore, by the same result, the 
dimension of M® is n — (n — m). In other words M is an m-dimensional 
subspace of an m-dimensional space M96, and that implies that M and M9? 
must be the same. 


Solution 81. 


Suppose that A is a linear transformation on a finite-dimensional vector 
space V and A’ is its adjoint on V’. If u is an arbitrary vector in ker A’, so 
that A'u = 0, then, of course (A'u)(z) = 0 for every z in V, and conse- 
quently u(Az) = 0 for every z in V. The latter equation says exactly that u 
takes the value 0 at every vector in the range of A, or, simpler said, that u 
belongs to (ran A)°. The argument is reversible: if u belongs to (ran A)?, 
so that u( Az) = 0 for every z, then (A'u)(z) = 0 for every z, and therefore 
A'u = 0, or, simpler said, u belongs to ker A’. Conclusion: 


ker A’ = (ran A)°. 


It should not come as too much of a surprise that annihilators enter. The 
range and the kernel of A are subspaces of V, and the range and the kernel 
of A’ are subspaces of V'—what possible relations can there be between 
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subspaces of V and subspaces of V’? The only known kind of relation (at 
least so far) has to do with annihilators. 
If A is replaced by A’ in the equation just derived, the result is 


ker A” = (ran A’)°, 


an equation that seems to give some information about ran A'—that's 
good. The information is, however, indirect (via the annihilator), and it is 
expressed indirectly (in terms of A" instead of A). Both of these blemishes 
can be removed. If V" is identified with V (remember reflexivity), then A" 
becomes A, and if the annihilator of both sides of the resulting equation is 
formed (remember double annihilators), the result is 


ran A’ = (ker A). 


Question. Was finite-dimensionality needed in the argument? Sure: the 
second paragraph made use of reflexivity. What about the first paragraph 
—is finite-dimensionality needed there? 


Solution 82. 


What is obvious is that the adjoint of a projection is a projection. The rea- 
son is that projections are characterized by idempotence (Problem 71), and 
idempotence is inherited by adjoints. 

Problem 71 describes also what a projection is “on” and “along”: it 
says that if E is the projection on M along N, then 


N-kerE 
and 
M = ran E. 
It is a special case of the result of Solution 80 that 
ker E' — (ran E)? 
and 
ran E' = (ker E)?. 


Consequence: E” is the projection on N° along M°. 


Solution 83. 


Suppose that A is a linear transformation on a finite-dimensional vector 
space V, with basis (e1,...,e1), and consider its adjoint A’ on the dual 
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space V’, with the dual basis {u1,...,un}. What is wanted is to compare 
the expansion of each A'u in terms of the u’s with the expansion of each Ae 
in terms of the e’s. The choice of notation should exercise some alphabetic 
care; this is a typical case where subscript juggling cannot be avoided, and 
carelessness with letters can make them step on their own tails in a con- 
fusing manner. 

The beginning of the program is easy enough to describe: expand A"u; 
in terms of the v's, and compare the result with what happens when Ae; 
is expanded in terms of the e's. The alphabetic care is needed to make 
sure that the *dummy variable" used in the summation is a harmless one— 
meaning, in particular, that it doesn't collide with either j or i. Once that’s 
said, things begin to roll: write 


L , 
A'u; = 5 Qj Uk, 
k 


evaluate the result at each ej, and do what the notation almost seems to 
force: 


A'uj(e;) = $ oL;uk(e)) = $ a jbee = a 
k k 


All right—that gives an expression for the matrix entries o7; of A’; what is 
to be done next? Answer: recall the way the matrix entries are defined for 
A, and hope that the two expressions together give the desired information. 
That is: look at 


Ae; = 5 Qkiêk, 
k 


apply each u,, and get 


uj(Aei) = uj (x anea) = Y onsen uy; (ex) = Y arierôjr = Qj. 
k k k 


Since 
uj (Aei) = A'uj (ei), 
it follows that 


Qij = Qji 
for alli and j. Victory: that’s a good answer. It says that the matrix entries of 
A’ are the same as the matrix entries of A with the subscripts interchanged. 
Equivalently: the matrix of A’ is the same as the matrix of A with the rows 
and columns interchanged. Still better (and this is the most popular point 
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of view): the matrix of A’ is obtained from the matrix of A by flipping it 
over the main diagonal. In the customary technical language: the matrix of 
A’ is the transpose of the matrix of A. 


Chapter 6. Similarity 


Solution 84. 


The interesting and useful feature of the relation between z and y is the 
answer to this question: how does one go from z to y? To "go" means 
(and that shouldn't be a surprise) to apply a linear transformation. The 
natural way to go that offers itself is the unique linear transformation T' 
determined by the equations 


Tz; =y, 


Tz, = ya. 
The linear transformation T' has the property that T'z — y; indeed 
Tz = T(oizi + t 0124) 
=t +- +AT aTr 
=Y +H + O1Yn = y. 


The answer to the original question, expressed in terms of T, is there- 
fore simply this: the relation between z and y is that Tz = y. That is: a 
“change of basis” is effected by the linear transformation that changes one 
basis to another. 


Question. Is T invertible? 


Solution 85. 


The present question compares with the one in Problem 83 the way ma- 
trices compare with linear transformations. The useful step in the solution 
of Problem 83 was to introduce the linear transformation T that sends the 
z's to the y's. Question: what is the matrix of that transformation (with re- 
spect to the basis (71, ...,2,))? The answer is obtained (see Solution 59) 
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by applying T to each z; and expanding the result in terms of the z's. If 
yj =T 2; = Daya, 


then 
Dou; 7 moun-. 5 e 
j j i i j 


Since, however, by assumption 
Y iz m D Ys) 
i j 
it follows that 
& = Y oun; 
j 


That's the answer: the relation that the question asks for is that the £'s can 
be calculated from the 7/'s by an application of the matrix (a;;). Equiva- 
lently: a change of basis is effected by the matrix that changes one coordi- 
nate system to another. 


Solution 86. 


The effective tool that solves the problem is the same linear transformation 
T that played an important role in Solutions 84 and 85, the one that sends 
the z's to the y’s. If, that is, 


Tzr; = yj (j 2 1,...,n) 
then 
Cyj = CTz; 


and 


Cy; = Y ouv = Y oj T 2; =T (= sn) = TBzx;. 
Consequence: 
for all j, so that 


CT =TB. 
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That’s an acceptable answer to the question, but the usual formulation 
of the answer is slightly different. Solution 84 ended by a teasing puzzle that 
asked whether T is invertible. The answer is obviously yes—T sends a basis 
(namely the z's) onto a basis (namely the y’s), and that guarantees invert- 
ibility. In view of that fact, the relation between C and B can be written in 
the form 


CSTET 


and that is the usual equation that describes the similarity of B and C. 

The last phrase requires a bit more explanation. What it is intended to 
convey is that B and C are similar if and only if there exists an invertible 
transformation T such that C = T BT" !. The argument so far has proved 
only the “only if". The other direction, the statement that if an invertible 
T of the sort described exists, then B and C are indeed similar, is not im- 
mediately obvious but it is pretty easy. It is to be proved that if T' exists, 
then B and C do indeed correspond to the same matrix via two different 
bases. All right; assume the existence of a T', write B as a matrix in terms 
of an arbitrary basis {z1, . . . , £n }, so that 


Bz; = J QijTi, 
i 


define a bunch of vectors y by writing Tz; = y; (j = 1,...,n), and then 
compute as follows: 


Conclusion: the matrix of B with respect to the z's is the same as the matrix 
of C with respect to the y's. 


Solution 87. 


Some notation needs to be set up. The assumption is that one linear trans- 
formation is given, call it B, and two bases 


(z1,..., En} and Iit: 


Each basis can be used to express B as a matrix, 
Bz; = Y essi and Byj = err 
i i 


and the question is about the relation between the 8’s and the 7’s. 
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The transformation T that has been helpful in the preceding three 
problems (Tz; = y;) can still be helpful, but this time (because temporar- 
ily matrices are at the center of the stage, not linear transformations), it 
is advisable to express its action in matrix language. The matrix of T with 
respect to the z's is defined by 


Tzj = J TkjTk: 
k 


and now the time has come to compute with it. Here goes: 


By; = BTz; = BY 32k 
k 
= Y Tij Bae 
k 
= Yn Y birti 
k i 


-5 p» Pans) i 


and 
By; = > Yi 
k 
= y: Wj T y 3: Yki D Tiki 
k k i 
= X (x mams) Ti. 
i k 
Consequence: 


5 TikYkj = J BikTkj- 
k k 


In an abbreviated but self-explanatory form the last equation asserts a re- 
lation between the matrices 8 and x, namely that 
TY = Br. 
The invertibility of 7 permits this to be expressed in the form 
y-T. ‘Br, 


and, once again, the word similarity can be used: the matrices and y are 
similar. 
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Solution 88. 


Yes, it helps to know that at least one of B and C is invertible; in that case 
the answer is yes. If, for instance, B is invertible, then 


BC = BC(BB™) = B(CB)B- 


the argument in case C is invertible is a trivial modification of this one. 
If neither B nor C is invertible, the conclusion is false. Example: if 


1 0 0 1 
B=(; and de 37 


BC=C and CB=0. 


then 


The important part of this conclusion is that BC 4 0 but CB = 0. 


Comment. There is an analytic kind of argument, usually frowned upon as 
being foreign in spirit to pure algebra, that can sometimes be used to pass 
from information about invertible transformations to information about 
arbitrary ones. An example would be this: if B is invertible, then BC and 
CB are similar; if B is not invertible then it is the limit (here is the analysis) 
of a sequence {B,,} of invertible transformations. Since B,,C is similar to 
C Bn, it follows (?) by passage to the limit that BC is similar to CB. 

The argument is phony of course—where does it break down? What 
is true is that there exist invertible transformations T;, such that 


(B4C)T, = Ta(C Bn), 


and what the argument tacitly assumes is that the sequence of T’s (or pos- 
sibly a subsequence) converges to an invertible limit T. If that were true, 
then it would indeed follow that (BC)T = T(BC)—hence, as the proof 
above implies, that cannot always be true. Here is a concrete example for 
the B and C mentioned above: if 


1 0 
iege i) 


then indeed 
(B,C)T, = T,(CB,) 


for all n, and the sequence {T7 ! ) converges all right, but its limit refuses 
to be invertible. 
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Solution 89. 


Yes, two real matrices that are complex similar are also real similar. Sup- 
pose, indeed, that A and B are real and that 


SA= BS, 


where S is an invertible complex matrix. Write S in terms of its real and 
imaginary parts, 


S=P+iQ. 


Since PA +iQA = BP +iBQ, and since A, B, P, and Q are all real, it 
follows that 


PA=BP and  QA- BQ. 


The problem might already be solved at this stage; it is solved if either 
P or Q is invertible, because in that case the preceding equations imply that 
A and B are “real similar". Even if P and Q are not invertible, however, 
the solution is not far away. 

Consider the polynomial 


p(A) = det(P + AQ). 


Since p(i) = det(P + iQ) 4 0 (because S is invertible), the polynomial 
p is not identically 0. It follows that the equation p(A) = 0 can have only 
a finite number of roots and hence that there exists a real number such 
that the real matrix P + AQ is invertible. That does it: since 


(P+ AQ)A = PA-- AQA = BP + ABQ = B(P + AQ), 


the matrices A and B are similar over the field of real numbers. 

The computation in this elementary proof is surely mild, but it’s there 
just the same. An alternative proof involves no computation at all, but it 
is much less elementary; it depends on the non-elementary concept of “el- 
ementary divisors”. They are polynomials associated with a matrix; their 
exact definition is not important at the moment. What is important is that 
their coefficients are in whatever field the entries of the matrix belong to, 
and that two matrices are similar if and only if they have the same elemen- 
tary divisors. Once these two statements are granted, the proof is finished: 
if A and B are real matrices that are similar (over whatever field happens 
to be under consideration, provided only that it contains the entries of A 
and B), then they have the same elementary divisors, and therefore they 
must be similar over every possible field that contains their entries—in par- 
ticular over the field of real numbers. 
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Solution 90. 
Since 
ker A’ = (ran A)? 
(by Problem 80) and since 
dim(ran A)? = n — dimran A 
(by Problem 65), it follows immediately that 
null A’ = n — rank A, 


Suppose now that (21,..., 4) is a basis for ker A and extend it to a 
basis (21,..., Tm, Zm41;---, Tn } of the entire space V. If 


T = 0321 T: dT OZ + Omsilm4y1 $+ OnZa 
is an arbitrary vector in V, then 
Az = Am41ÅTm41 od QnATn, 


which implies that ran A is spanned by the set (Az,,,1,..., Ar, }. Conse- 
quence: 


dimran A € n — m, 
or, in other words, 
rank A € n — null A. 


Apply the latter result to A', and make use of equation above connecting 
null A' and rank A to get 


rank A' € rank A. 


That almost settles everything. Indeed: apply it to A' in place of A to 
get 


rank A" € rank A'; 


in view ofthe customary identification of A" and A, the last two inequalities 
together imply that 


rank A = rank A’. 
Consequence: 


null A’ = n — rank A’, 
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and that equation, with A’ in place of A (and the same identification argu- 
ment as just above) yields 


rank A + null A= n. 


The answer to the first question of the problem as stated is that if 
rank A = r, then there is only one possible value of rank A’, namely the 
same r. The answer to the second question is that there is only one possible 
value of null A, namely n — r. 


Comment. A special case of the principal result above (rank + nullity = 
dimension) is obvious: if rank A = 0 (which means exactly that ran A = O), 
then A must be the transformation 0, and therefore null A must be n. A 
different special case, not quite that trivial, has already appeared in this 
book, in Problem 65. The theorem there says, in effect, that the nullity of a 
linear transformation on a space of dimension n is 0 if and only if its rank 
is n—and that says exactly that the sum formula is true in case null A = 0. 


Solution 91. 


Yes, similar transformations have the same rank. Suppose, indeed, that B 
and C are linear transformations and T' is an invertible linear transforma- 
tion such that 


CT - TB. 
If y isa vector in ran B,sothaty = Bz for some vector z, then the equation 
CTz = Ty 


implies that Ty is in ran C, and hence that y belongs to T^! (ran C). What 
this argument proves is that 


ran B C T" ! (ranC). 


Since the invertibility of T implies that T^! (ran C) has the same dimension 
as ran C, it follows that 


rank B € rank C. 


The proof can be completed by a lighthearted call on symmetry. The as- 
sumption that B and C are similar is symmetric in B and C; if that assump- 
tion implies that rank B < rank C, then it must also imply that 


rank C € rank B, 
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and from the two inequalities together it follows that 


rank B = rankC. 


Solution 92. 


The answer is yes, every 2 x 2 matrix is similar to its transpose, and a sur- 
prisingly simple computation provides most of the proof: 


ub EU se DOS. 


i ) is invertible and that's that: » B ) 


If neither 8 nor y is 0, then (3 B 5 


ea xd Q 
is indeed similar to 


B 
If 8 = 0 and y ¥ 0, the proof is still easy, but it is not quite so near the 


surface. If worse comes to worst, computation is bound to reveal it: just set 


(96 9-6 a) (E a) 


and solve the implied system of four equations in four unknowns. It is 
of course not enough just to find numbers £, 7, ¢, and @ that satisfy the 
equations—for instance € = ņ = Ç = 0 = 0 always works—it is necessary 
also to find them so that the matrix is invertible. One possible solution is 
indicated by the equation 


I MA 5 s) = (s 862) = (o i) p sla) 
Y 


$ 
The case in which y = 0 and 8 # 0, that is, the problem of the simi- 


larity of 
a f a 0 
(25) « (Gi) 


is the same as the one just discussed: just replace 8 by y and interchange 
the order in which the matrices were written. 

(Does the assertion that similarity is a symmetric relation deserve ex- 
plicit mention? If B and C are similar, via T, that is if 


Y 
él 


That works (meaning that a) is invertible) because y + 0. 


CT =TB. 
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then 
T (CT)T! = T-'(TB)T !. 


Replace TT! and T-!T by J and interchange the two sides of the equa- 
tion to get 


BI ege 
and that is exactly the similarity of C and B via T^ !.) 


If both 8 and y are 0, the matrix B ) degenerates to a diago- 


6 
nal one, and the question of similarity to its transpose degenerates to a 


triviality—it is equal to its transpose. 


Comment. That settles 2 x 2 matrices; what happens with matrices of 
size 3 and greater? The answer is that the same result is true for every 
size—every matrix is similar to its transpose—but even for 3 x 3 matri- 
ces the problem of generalizing the computations of the 2 x 2 case be- 
comes formidable. New ideas are needed, more sophisticated methods are 
needed. They exist, but they will come only later. 


Solution 93. 


The answer is 
rank(A + B) € rank A + rank B. 
For the proof, observe first that 
ran Á + ran B 


(in the sense of sums of subspaces, defined in Problem 28) consists of all 
vectors of the form 


Az + By, 
and that 
ran(A + B) 
consists of all vectors of the form 
Az + Bz. 
Consequence: 


ran(A + B) C ran A + ran B, 
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and, as a consequence of that, 
rank(A + B) € dim(ran A + ran B). 
How is the right side of this inequality related to 
dimran A + dimran B? 
In general: if M and N are subspaces, what is the relation between 
dim(M + N) and dim M + dim N? 


The answer is a natural one to guess and an easy one to prove, as fol- 
lows. If (z1,...,7,,) isa basis for M and (yi,..., yn} isa basis for N, then 
the set 


355, 8m; Y1,- - -3 Yn} 


is surely big enough to span M +N. Consequence: the dimension of M +N 
is not more than m + n, or, in other words, 


dim(M + N) € dim M + dim N, 


The proof of the rank sum inequality is complete. 


Solution 94. 
Since (AB)z = A(Bz), it follows that 
ran AB C ran A, 
and hence that 
rank AB € rank A. 


Words are more useful here than formulas: what was just proved is that the 
rank of a product is less than or equal to the rank of the left-hand factor. 
That formulation implies that 


rank(B' A") € rank B'. 
Since, however, 
rank(B' A") = rank((AB)’) = rank AB, 
and 


rank B’ = rank B 
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(Problem 88), it follows that rank AB is less than or equal to both rank A 
and rank B, and hence that 


rank AB < min{rank A, rank B). 

That’s it; that’s the good relation between the rank of a product and the 
ranks of its factors. 
Comment. If B happens to be invertible, so that rank B is equal to the 
dimension of the space, then the result just proved implies that 

rank AB € rank A 
and at the same time that 

rank A = rank(AB)B™! < rank AB, 

so that, in fact, 

rank AB = rank A. 
It follows that 

rank(BA) = rank(BA)’ = rank(A' B^) = rank A’ = rank A. 


In sum: the product of a given transformation with an invertible one (in 
either order) always has the same rank as the given one. 


Solution 95. 


The range of a transformation A is the image under A of the entire space 
V, and its dimension is an old friend by now—that’s just the rank. What 
can be said about the dimension of the image of a proper subspace of V? 
The question is pertinent because 


ran(AB) = (AB)V = A(BV) = A(ran B), 
so that 
rank(AB) = dim(A(ran B)). 


If M is a subspace of dimension m, say, and if N is any complement of 
M, so that 


V=M+4N, 
then 


ran A = AV = AM + AN. 
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It follows that 
rank A € dim( AM) + dim(AN) € dim(AM) + dim(N) 


(because the application of a linear transformation can never increase di- 
mension), and hence that 


n — null A < dim(AM) +n- m 
(where n — dim V, of course). If in particular 
M —ran B, 
then the last inequality implies that 
rank B — null A € rank(AB), 
or, equivalently, that 
n — null A — null B € n — null( AB). 
Conclusion: 
null(AB) < null A + null B. 


"Together the two inequalities about products, namely the one just proved 
about nullity and the one (Problem 89) about rank, 


rank AB < min{rank A, rank B), 


are known as Sylvester's law of nullity. 


Solution 96. 


(a) The “natural” basis vectors 
ei = (1,0,0), e2=(0,1,0), e3 =(0,0,1) 
have a curious and special relation to the transformation C: it happens that 
Cei —ej, Ceg=2e2, Ces = 3e3. 
If B and C were similar, 
CT =TB, 
or, equivalently, 


BT !-T-!C, 
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then the vectors 
Te. Te Te, 
would have the same relation to B: 
BT le; -T-!Cej = T! (jej) = jT" le; 


for j — 1,2,3. Is that possible? Are there any vectors that are so related 
to B? 

There is no difficulty about 7 = 1: the vector f, can be chosen to be 
the same as e,. What about fo? Well, that's not hard either. Since 


Bej—ej, Bez = e; + 2e», 
it follows that 
Bie, + €2) =e, + (e1 + 2€2) = 2(e1 + e2); 
in other words if f? = e, + e2, then 
Bfa = 2fa. 
Is this the beginning of a machine? Yes, it is. Since 
Bes = ei +€2 +63, 
it follows that 
Bie; + e2 + e3) = Ble; + e2) + Beg 
= 2(e1 + e2) + (e + e2 + 3e3) = 3(e; + e2 + ea); 
to get 
Bfs = 3fs, 
just set 
fz = e1 te2+e3. 


What good does all that do? Answer: it proves that B and C are sim- 
ilar. Indeed: the vectors fi, f2, f3, expressed in coordinate forms in terms 
of the e’s as 


fi = (1,0, 0) 
fa = (1,1,0) 
fs = (1, 1, 1), 


constitute a basis. The matrix of B with respect to that basis is the matrix 
C. Isn't that clear from the definition of the matrix of B with respect to the 
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basis { f1, f2, fs}? What that phrase means is this: form Bf; (for each 7), 
and express it as a linear combination of the f’s; the resulting coefficients 
are the entries of column number j. Conclusion: B and C are similar. 

(b) The reasoning here is similar to the one used above. The linear 
transformation C has the property that 


Ce, =0, Ceg=e,, Cez = e. 
If B and C are similar, 
BT! = T! C, 
then the vectors 
fi=T e; (j =1,2,3) 
are such that B f, = 0, and, for j > 0, 
Bf; = BT lej T !Cej = T lej., = fj-1. 


At this moment it may not be known that B and C are similar, but it makes 
sense to ask whether there exist vectors fi, fo, fs that B treats in the way 
just described. 

Yes, such vectors exist, and the proof is not difficult. Just set fı equal 
to ej, set fo equal to eo, and then start looking for f3. Since Bes = e1 + es, 
it follows that 


B(es — e2) = (e1 + €2) — €1 = ex; 
in other words, if f = e3 — e2, then 


Bfs = fa. 


Once that's done, the problem is solved. The vectors fi, fo, f3, expressed 
in coordinate forms in terms of the e's as 


h = (1, 0,0) 
fa = (0, 1,0) 
fa = (0, —1, 1), 


constitute a basis, and the matrix of B with respect to that basis is equal 
to C. 

(c) The most plausible answer to both (a) and (b) is no—how could 
a similarity kill all the entries above the diagonal? Once, however, the an- 
swers have been shown to be yes, most people approaching (c) would prob- 
ably be ready to guess yes—but this time the answer is no. 
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What is obvious is that 
Ce, = 2e], Cez = 3e2, Ces = 3e3. 


What is one millimeter less obvious is that every linear combination of ez 
and e3 is mapped onto 3 times itself by C. What must therefore be asked 
(in view of the technique established in (a) ) is whether or not there exist 
vectors fi, fo, fs that B treats the way C treats the e's. The answer turns 
out to be no. 

Suppose indeed that Bf — 3f, where f is a vector whose coordinate 
form in terms of the e's is, say (a1, a2, a3). Since 


Bf = (2a; + az + a3, 3a2 + a3, 303), 


the only way that can be equal to 


3f = (301, 3as, 305), 


is to have a3 = 0 (look at the second coordinates). From that in turn it 
follows that ag = o, (look at the first coordinates). To sum up: f must 
look like (7, 7, 0), or simpler said, every solution of the vector equation 
Bf = 3f is of the form (7, 7,0). Consequence: the set of solutions of that 
vector equation is a subspace of dimension 1, not 2. For C the correspond- 
ing dimension was 2, and that distinction settles the argument—B and C 
cannot be similar. 

(d) In view of all this, what would a reasonable person guess about (d) 
by now? Is it imaginable that a similarity can double a linear transforma- 
tion? 

Yes, it is. The action of B on the natural basis (e;,€2, e3} can be de- 
scribed this way: the first basis vector is killed, and the other two are shifted 
backward to their predecessors. The question is this: is there a basis such 
that the first of its vectors is killed and each of the others is shifted back- 
ward to twice its predecessor? In that form the answer is easy to see: put 


h- (1,0,0) 
fa = (0, 2,0) 
f= (0,0,4). 


That solves the problem, and nothing more needs to be said, but it 
might be illuminating to see a linear transformation that sends the e’s to 
the f’s and, therefore, actually transforms B into C. That's not hard: if 


100 
T=|0 4 0J, 
00 1 
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then 


and painless matrix multiplication proves that TBT! = 2B. 

(e) The matrix of B with respect to the natural basis (e;, e2, e3} is the 
one exhibited in the question; what is the matrix of B with respect to the 
basis given by 


fi = €3, 
fo = ea, 
fz =e1? 


The answer is as easy as any matrix determination can ever be. Since 
Bf = Bes = ei ea ea = fat fat fi 
Bf, = Bez = e1 ea = f3 + fa 


Bf; = Bei = fa, 


it follows that the matrix of B with respect to the f’s is exactly C. 
Note that C is the transpose of B; compare with Problem 92. 


Solution 97. 
Define a linear transformation P by 
P$; = 9; (j =1,...,n) 


and compute: 
Cy; = o ouf us So aj PS: = PY 0458 m PBz;. 
i i i 


The result almost forces the next step: to make the two extreme terms of 
this chain of equalities comparable, it is desirable to introduce the linear 
transformation Q for which 
zj = Qyj (j =1,...,n). 
The result is that Cy; = PBQy; for all j, and hence that 
C — PBQ. 


That's the answer: B and C are equivalent if and only if there exist 
invertible transformations P and Q such that C — PBQ. The argument 
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just given proves “only if”, but, just as for Problem 85, the “if” must be 
proved too. The proof is a routine imitation of the proof in Problem 85, 
except it is not quite obvious how to set up the notation: what can be chosen 
arbitrarily to begin with and what should be defined in terms of it? Here is 
one way to answer those questions. 
Assume that C = P BQ, and choose, arbitrarily, two bases 
191555254] and (yi synt 
Write 
Xj = Qy; and 9; = Pass 


and write B as a matrix with respect to the z's and Z’s: 
Bz j— DD Qij Ti. 
i 
It follows that 


Cy; = PBQy; = PBzj = PY ^ oisi = S oy PR = DEA 
i i i 


Comparison of the last two displayed equations shows the matrix C (Y, F) 
of C with respect to the y’s is the same as the matrix B(X, X) of B with 
respect to the z's. 


Question. If B is equivalent to C, does it follow that B? is equivalent to 
C?? The first attempt at answering the question, without using the follow- 
ing problem, is not certain to be successful. 


Solution 98. 


Suppose that A is a linear transformation of rank r, say, on a finite-dimen- 
sional vector space V. Since the kernel of A is a subspace of dimension 
n — r, standard techniques of extending bases show that there exists a basis 


Dy yee Yr Tr4ls -Tn 

of V such that (z.,4,..., Zn } isa basis for ker A. Assertion: the vectors 
yı = Az,..., yr = Az, 

are linearly independent. Indeed: the only way it can happen that 


yA +++ +7rAz, = 0, 
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is to have 12; + --- + y, z, in the kernel of A. Reason: since the x’s form 
a basis for V, the only way a linear combination of the first r of them can 
be equal to a linear combination of the last n — r, is to have all coefficients 
equal to 0. 

Once that is known, then, of course, the set (y;,..., y, ) can be ex- 
tended to a basis 


Yl- es Yrs Urb Un 


of V. What is the matrix of A with respect to the pair of bases (the z's and 
the y's) under consideration here? Answer: it is the n x n diagonal matrix 
the first r of whose diagonal terms are 1’s and the last n — r are 0’s. 

That remarkable conclusion should come as a surprise. It implies that 
every matrix of rankr is equivalent to a projection of rank r, and hence 
that any two matrices of rank r are equivalent to one another. 


Chapter 7. Canonical Forms 


Solution 99, 
If E is a projection, and if A is an eigenvalue of E with eigenvector x, so 
that Ex = Az, then 

Ex = E?y = E(Ex) = E(Az) = AEs = (Ax) = Mz. 


Since Az = A?z and x Æ 0 (by the definition of eigenvector), it follows 
that \ = A2. Consequence: the only possible eigenvalues of E are 0 and 1. 
Since the roots of the characteristic equation are exactly the eigenvalues, 
it follows that the only possible factors of the characteristic polynomial can 
be A and 1 — A, and hence that the characteristic polynomial must be of 
the form A*(A — 1)^7*, with k = 0,1,...,n. 


Question. If rank E = 0 (that is, if E = 0), then k = n; if rank E = n 
(that is, if E — 1), then k — 0; what is k for other values of rank E? 


Solution 100. 


The sum of the roots of a (monic) polynomial equation 


A qoa df py cape 
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is equal to —o1, and the product of the roots is equal to plus or minus a, 
(depending on whether n is even or odd). To become convinced of these 
statements, just write the polynomial in factored form 


(A7): (A 7 Àn). 


It follows that the sum and the product of the eigenvalues of a matrix A 
belong to the field in which the entries of A lie. 

The product of the eigenvalues is equal to the determinant (think 
about triangularization), and that observation yields an alternative proof 
of the assertion about the product. The sum of the eigenvalues is also easy 
to read off the matrix A: it is the coefficient of (—A)”~! in the expansion 
of det(A — A), and hence it is the sum of the diagonal entries. That sum 
has a name: it is called the trace of the matrix A. 

The answer to the question as it was asked is a strong NO: even though 
the eigenvalues of a rational matrix can be irrational, their sum and their 
product must be rational. 


Solution 101. 


The answer is yes: AB and BA always have the same eigenvalues. 

It is to be proved that if àA # 0 and AB — 4 is not invertible, then 
neither is BA — A, or, contrapositively, that if AB — A is invertible, then 
so is BA — A. Change signs (that is surely harmless), divide by A, and then 
replace A, say, by s These manipulations reduce the problem to proving 
that if 1— AB is invertible, then so is 1 — BA. 

At this point it turns out to be clever to do something silly. Pretend 
that the classical infinite series formula for 


is applicable, 


1—z 
1 
1-z 
and apply it, as a matter of purely formal juggling, to BA in place of x. The 
result is 


=l+aeta?ta?+..., 


(1— BA)! 21- BA BABA + BABABA +... 
—1-4 B(1-4- AB - ABAB * ---)A 
=1+ B(1- AB)! A. 


Granted that this is all meaningless, it suggests that, maybe, if 1 — AB is 
invertible, with, say 


(1— AB)! =X, 
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then 1 — BA is also invertible, with 
(1— BA)! 214 BXA. 


Now that statement may be true or false, but it is in any case meaningful— 
it can be checked. Assume, that is, that (1 — AB)X = 1 (which can be 
written as AB X = X — 1) and calculate: 


(1— BA)(1+ BXA) = (1+ BXA) - BA(1+ BXA) 
=1+BXA-—BA-BABXA 
=1+ BXA- BA- B(X —-1)A 
=1+BXA-BA-BXA+BA=1. 
Victory ! 


Solution 102. 
If A is an eigenvalue of A with eigenvector z, 
Az = dz, 
then 
A’z = A(Az) = A(Ar) = \(Az) = A(Ax) = Ma, 
and, by an obvious inductive repetition of this argument, 
A"g = Nx 


for every positive integer n. (For the integer 0 the equation is if possible 
even truer.) This in effect answers the question about monomials. A linear 
combination of a finite number of these true equations yields a true equa- 
tion. That statement is a statement about polynomials in general: it says 
that if p is a polynomial, then p(A) is an eigenvalue of p( A). 


Solution 103. 


Since the matrix of A is 


oor 


the characteristic equation is 
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Since 
3-12(A-1)0?-A- 1) 


it follows that the roots of the characteristic equation are the three cube 
roots of unity: 


l i 1 i 
1 = = -— d ——— mc : 
F w 5 t5V¥3 and w 5 ;V3 


The corresponding eigenvectors are easy to calculate (and even easier to 
guess and to verify); they are 


u = (1,1, 1), v = (1,w,w), and w= (1,w*,w). 
Comment. It is easy and worth while to generalize the question to di- 
mensions greater than 3 and to permutations more complicated than the 
simple cyclic permutation that sends (1, 2, 3) to (2,3,1). The most prim- 


itive instance of this kind occurs in dimension 2. The eigenvalues of the 
transformation A defined on C? by 


A(21, £2) = (22,21) 


are of course the eigenvalues 


of the matrix 


with corresponding eigenvectors 
(1,1) and (1-1) 


These two vectors constitute a basis; the matrix of A with respect to that 


basis is 
1 0 
0 -1/° 


The discussion of the 3 x 3 matrix A of the problem can also be re- 
garded as solving a diagonalization problem; its result is that that matrix is 


similar to 
10 0 
( w 0 ) ; 
0 0 w? 
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The next higher dimension, n = 4, is of interest. There the matrix 
becomes 


rococo 


0 
0 
1 
0 


ooor 
ooro 


Its characteristic equation is 
àt =1, 
and, therefore, its eigenvalues are the fourth roots of unity: 
1, i —1, —i. 


Consequence: the diagonalized version of the 4 x 4 matrix is 


10 0 0 
Oi 0 0 
00-1 0 
00 0 -i 


Solution 104. 


Yes, every eigenvalue of p( A) is of the form p(A) for some eigenvalue A of 
A. Indeed, if p, is an eigenvalue of p( A), consider the polynomial p(A) — p. 
By the fundamental theorem of algebra that polynomial can be factored 
into linear pieces. If 


p(rA) — w= (A— 31): (A= àn), 
then 
p(A) - p= (A- Xi): (A - A4). 


The assumption about i; implies that there exists a non-zero vector x such 
that p(.A)z — uz, and from that it follows that 


(A— à1) (A— àn) = 0. 
Consequence: the product of the linear transformations 
(A- A1)... (A - An) 


is not invertible, and from that it follows that A — A; is not invertible for 
at least one j. That means that A; is an eigenvalue of A for at least one j. 
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Conclusion: since 


the eigenvalue jz does indeed have the form p(A) for some eigenvalue A of 
A (namely A = A;). 


Comment. The set of all eigenvalues of a linear transformation A on a 
finite-dimensional vector space is called the spectrum of A and is often 
referred to by the abbreviation spec A. With that notation Solution 102 
can be expressed by saying that if A is a linear transformation, then 


p(spec A) C spec(p(A)), 


and the present solution strengthens that to 


p(spec A) = spec(p(A)). 


Another comment deserves to be made, one about the factorization 
technique used in the proof above. Is spec(A) always non-empty? That 
is: does every linear transformation on a finite-dimensional vector space 
have an eigenvalue? The answer is yes, of course, and the shortest proof 
of the answer (the one used till now) uses determinants (the characteristic 
equation of A). The factorization technique provides an alternative proof, 
one without determinants, as follows. 

Given A, on a space of dimension n, take any non-zero vector x and 
form the vectors 


z, Az, A*x,..., A? a. 


Since there are n 4- 1 of them, they cannot be linearly independent. It fol- 
lows that there exist scalars ag, 04, 02, .. . , o, not all 0, such that 


az + o4 Az -- a3 A?z +--+ a A" = 0. 


It simplifies the notation and it loses no information to assume that if k 
is the largest index for which a; # 0, then, in fact, a, = 1—just divide 
through by o. A different language for saying what the preceding equa- 
tion says (together with the normalization of the preceding sentence) is 
to say that there exists a monic polynomial p of degree less than or equal 
to n such that p( A)z = 0. Apply the fundamental theorem of algebra to 
factor p: 


P(A) = (à 7 A): (A= A) 
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Since p(A)z = 0 it is possible to reason as in the solution above to infer 
that A — A; is not invertible for at least one j—and there it is!—A; is an 
eigenvalue of A. 

The fundamental theorem of algebra is one of the deepest and most 
useful facts of mathematics—its repeated use in linear algebra should not 
come as a surprise. The need to use it is what makes it necessary to work 
with complex numbers instead of only real ones. 


Solution 105. 


The polynomials 1, x, z?, z? form a basis of P5; the matrix of D with respect 
to that basis is 


0100 

0020 

0003 

0000 
Consequences: the only eigenvalue of D is 0, and the characteristic poly- 
nomial of D is A^. The algebraic multiplicity of the eigenvalue 0 is the 


exponent 4. What about the geometric multiplicity? The question is about 
the solutions p of the equation 


Dp = 0; 


in other words, the question is about the most trivial possible differen- 
tial equation. Since the only functions (polynomials) whose derivative is 
0 are the constants, the geometric multiplicity of 0 (the dimension of the 
eigenspace corresponding to 0) is 1. 


Solution 106. 


The answer is good: every transformation on an n-dimensional space with 
n distinct eigenvalues is diagonalizable. 

Suppose, to begin with, that n = 2. If A is a linear transformation 
on a 2-dimensional space, with distinct eigenvalues A; and Ag and corre- 
sponding eigenvectors zı and x2, then (surprise?) xı and xz are linearly 
independent. Reason: if 


O12, + Aero = 0, 


apply A — 2, to that equation. Since A — A, kills x1, the result is 


a2 (Ag = Ai)z2 = 0, 
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and since A2 — A, Æ 0 (assumption) and z2 Æ 0 (eigenvector), it follows 
that a2 = 0. That in turn implies that a, = 0, or, alternatively, an applica- 
tion of A — As to the assumed equation yields the same conclusion. 

If n = 3, and if the three distinct eigenvalues in question are A, As, 
and As, with eigenvectors x1, x2, and x3, the same conclusion holds: the 
z's are linearly independent. Reason: if 


O12] + 6919 + 0313 = 0, 


apply A — A, to infer 


o2(A2 — À1)22 + a3(A3 — à1)£3 = 0, 
and then apply A — A2 to infer 


o3(As — A1) (As — A)zs = 0. 


That implies a3 = 0 (because (A3—A1)(A3—A2) # 0 and z3 Æ 0). Continue 
the same way: apply first A — A; and then A — As to get o = 0, and by 
obvious small modifications of these steps get o; = 0. 

The general case, for an arbitrary n, should now be obvious, and from 
it diagonalization follows. Indeed, once it is known that a transformation 
on an n-dimensional space has n linearly independent eigenvectors, then 
its matrix with respect to the basis those vectors form is diagonal—and in 
this last step it no longer even matters that the eigenvalues are distinct. 


Comment. Here is a minute but enchanting corollary of the result: a 2 x 2 
real matrix with negative determinant is diagonalizable. Reason: since the 
characteristic polynomial is a quadratic real polynomial with leading coef- 
fcient 1 and negative constant term, the quadratic formula implies that the 
two eigenvalues are distinct. 


Solution 107. 


Suppose that A is a linear transformation on a finite-dimensional vector 
space and that Ag is one of its eigenvalues with eigenspace Mo. If x belongs 
to Mo, 


Az = Xoz, 
then 
A(Az) = A?z = Mz = Ag(Aoz) = Ao: AT 


—which says that Az belongs to Mo. In other words, the subspace Mo is 
invariant under A. If Apo is the linear transformation A considered on Mo 
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only (the restriction of A to Mo), then the polynomial det(Ao — A) is a 
factor of the polynomial det(A — A). ( Why?). If the dimension of Mo 
(= the geometric multiplicity of Ao) is mo, then 


det( Ao = à) = (ào — Ay"o, 


and it follows that (Ag — A) occurs as a factor of det(A — A) with an expo- 
nent m greater than or equal to mo. That's it: the assertion m 2 mo says 
exactly that geometric multiplicity is always less than or equal to algebraic 
multiplicity. 


Comment. What can be said about a transformation for which the alge- 
braic multiplicity of every eigenvalue is equal to 1? In view of the present 
result the answer is that the geometric multiplicity of every eigenvalue is 
equal to 1, and hence that the number of eigenvalues is equal to the di- 
mension. Conclusion (see Problem 106): the matrix is diagonalizable. 


Solution 108. 


The calculation of the characteristic polynomials is easy enough: 


det(A — A) = (1— A)(2— A)(6 — A) + 3 + 6(1— A) + (6 — A) 
and 
det(B — à) = (1 — A)(5 — A)(3 — A) + 4(3 — à). 


These both work out to 


A3 — 9d? + 27A — 27, 
which is equal to 


(A— 39. 


It follows that both A and B have only one eigenvalue, namely A = 3, of 
algebraic multiplicity 3, and, on the evidence so far available, it is possible 
to guess that A and B are similar. 

What are the eigenvectors of A? To have Au = 3u, where u = (a, 8, vy), 
means having 


a+ B = 3a 
—a+26+ y=38 
3a — 68 + 6y = 37. 
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These equations are easy to solve; it turns out that the only solutions are the 
vector u = (1,2,3) and its scalar multiples. Consequence: the eigenvalue 3 
of A has geometric multiplicity 1. 

For B the corresponding equations are 


a+ 8 = 3a 
—4a+56+ =36 
—6o — 38 + 3y = 34. 


The eigenspace of the eigenvalue 3 is 2-dimensional this time; it is the set of 
all vectors of the form (a, 2o; à). Consequence: the eigenvalue 3 of B has 
geometric multiplicity 2. Partial conclusion: A and B are not similar. 

The upper triangular form of both A and B must be something like 


3 «a vy 
0.3 8 
00 3 


Even a little experience with similarity indicates that that form is not 
uniquely determined—the discussion of Problem 94 shows that similarity can 
effect radical changes in the stuff above the diagonal (see in particular part 
(b)). Here is a pertinent special example that fairly illustrates the general case: 


1 -1 0 3 1 1 110 3.10 
0 1 0 0 3 1 010|-21[|[03 1 
0 0 1 0 0 3 0 0 1 0.0 3 


In view of these comments it is not unreasonable to restrict the search for 
triangular forms to those whose top right corner entry is 0. 
For A, the search is for vectors u, v, and w such that 


Au=3u, Av=u+3v, and Aw=v+4+3w. 


As for u, that’s already at hand—that’s the eigenvector (1, 2,3) found 
above. 

If v = (a@,@,7¥) (the notation of the calculation that led to u is now 
abandoned), then the equation for v says that 


a+ B =3a4+1 
—a+26+ y= 3842 
3a — 68 + ôy = 3y + 3. 


These equations are just as easy to solve as the ones that led to u. Their 
solutions are the vectors of the form (a,2a + 1,3a + 3)—a space of 
dimension 1. One of them (one is enough) is (0,1,3)—call that one v. 
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If w = (a, G, y) (another release of old notation), then the equation for w 
becomes 


a+ B = 3a 
-a +28+ y-23841 
3a — 68 + ôy = 3y + 3. 


The solutions of these equations are the vectors of the form (æ, 2a, 3a + 1); 
a typical one of which (with a = 1) is w = (1,2,4). 

The vectors u, v, and w so obtained constitute the basis; the matrix of A 
with respect to that basis is 


3 10 
0 3 1 
0 0 3 


as it should be. 
The procedure for B is entirely similar. Begin with the eigenvector u = 
(1, 2,3) with eigenvalue 1, and then look for a vector v such that 


Bu=ut3v. 
If v = (a, B, y), this equation becomes 


a+ 8 — 3o 41 
—4a +58 = 38402 
—60 + 36+ 3y 2 3y + 3, 


and the solutions of that are the vectors of the form (a, 2a + 1,3). If w is the 
one with a = 0, so that w = (0, 1,3), then the vectors u, v, and w constitute 
a basis, and the matrix of B with respect to that basis is 


3.1 0 
03 0 
00 3 


Solution 109. 


If n is 2, the answer is trivially yes. If the question concerned C" instead 
of R” (with the understanding that in the complex case the dimension be- 
ing asked about is the complex dimension), the answer would be easily yes 
again; just triangularize and look. One way of proving that the answer to 
the original question is yes for every n is to “complexify” R” and the lin- 
ear transformations that act on it. There are sophisticated ways of doing 
that for completely general real vector spaces, but in the case of IR? there 
is hardly anything to do. Just recall that if A is a linear transformation on 
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R^, then A can be defined by a matrix (with real entries, of course), and 
such a matrix defines at the same time a linear transformation (call it A*) 
on C”. 

The linear transformation At on C” has an eigenvalue and a corre- 
sponding eigenvector; that is 


Atz = àz 


for some complex number A and for some vector z in C”. Consider the 
real and imaginary parts of the complex number A and, similarly, separate 
out the real and imaginary parts of the coordinates of the vector z. Some 
notation would be helpful; write 


A=a+t+ if, 
with o and @ real, and 
z=2+1y, 
with x and y in R”. Since 
At (z + iy) = (a -- iB)(z + iy), 
it follows that 
Az = ax — Gy 
and 
Ay = Gx + ay. 


There it is—that implies the desired conclusion: the subspace of R” 
spanned by x and y is invariant under A. 


Solution 110. 


Yes, if a linear transformation A on a finite-dimensional (complex) vector 
space is such that A* = 1 for some positive integer k, then A is diagonal- 
izable. Here is the reasoning. 

The assumption implies that every eigenvalue A of A is a kth root of 
unity. Consequence: each block in a triangularization of A is of the form 
A +T, where \* = 1 and where 


O * x 
0 0 x 
E600 


oo00 
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is strictly upper triangular. By the binomial theorem, 


(A+T)F=14+kTH+-::, 


where the possible additional terms do not contribute to the lowest non- 
zero diagonal of T. Conclusion: (A + T)* can be 1 only when T = 0, that 
is, only when each block in the triangularization is diagonal. 


Solution 111. 


Since M(x) is spanned by the q vectors 
z, Az, A?z,... , At 1g, 


its dimension cannot be more than q; the answer to the question is that 
for an intelligently chosen x that dimension can actually attain the value 
q. The intelligent choice is not too difficult. Since the index of A is q, there 
must exist at least one vector zo such that 


Ata £0, 


and each such vector constitutes an intelligent choice. 
The assertion is that if 


axo + ay Ary + ogA?zo 4- + o4 AT! T0 =0, 


then each a; must be 0. If that is not true, then choose the smallest index 
j such that a; # 0. (If ag 7 0, then of course j = 0.) It makes life a little 
simpler now to normalize the assumed linear dependence equation: divide 
through by o; and transpose all but Azo to the right side. The result is an 
equation that expresses Azo as a linear combination of vectors obtained 
from zo by applying the higher powers of A (that is, the powers A* with 
k 2 j + 1). Consequence: 


Aizto = Aly 
for some y. Since 
At! 39 = ATI- Aigo = ATI AIt y = Aly = 0 


(the last equal sign is justified by the assumption that A is nilpotent of 
index q), a contradiction has arrived. (Remember the choice of x.) Since 
the only possibly shaky step that led here was the choice of j, the forced 
conclusion is that that choice is not possible. In other words, there is no 
smallest index j for which a; # 0—which says that a; = 0 for all j. 
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Corollary. The index of nilpotence of a transformation on an space of di- 
mension n can never be greater than n. 


Solution 112. 


Perhaps somewhat surprisingly, the answer depends on size. If the dimen- 

sion of the underlying space is 2, or, equivalently, if A and B denote 2 x 2 

matrices, then AB and BA always have the same characteristic polyno- 

mial, and it follows that if AB is nilpotent, then so is BA. If a matrix of 

size 2 is nilpotent, then its index of nilpotence is less than or equal to 2. 
For 3 x 3 matrices the conclusion is false. If, for instance, 


1 0 0 00 0 
A={0 1 0 and B={1 0 0 
0 0 0 0 1 0 


then 


is nilpotent of index 2, but BA = A is not; it is nilpotent of index 3. 


Solution 113. 


The result of applying M to a vector (a, 3,7, 6,€) is (8 + ô, y — €,0, 6,0). 
When is that 0—or, in other words, which vectors are in the kernel of M? 
Answer: € must be 0, hence y must be 0, and 8 -- ó must be 0. So: the kernel 
consists of all vectors of the form 


(a, 8,0, — 8,0), 


a subspace of dimension 2. In view of this observation, and in view of the 
given form of M, a reasonable hope is to begin the desired basis with 


(1,0,0,0,0) 
(0,1,0,0,0) 
(0, 0, 1,0,0) 
and 
(0, —1,0, 1,0). 


What is wanted for a fifth vector is one whose image under M is 


SOLUTIONS: Chapter 7 291 


(0, —1,0, 1,0). Since the image of (a, 8, y, 6, €) is (8 +8, y—e, 0, €, 0), what 
is wanted is to have 8 + 6 = 0, y — € = —1, and e = 1. These equations 
have many solutions; the simplest among them is 


(0,0, 0, 0, 1). 


That’s it: the last five displayed vectors do the job. 


Solution 114. 


The answer is no but yes. No, not every matrix has a square root, but the 
reason is obvious (once you see it), and there is a natural way to get around 
the obstacle. 

An example of a matrix with no square root is 


01 0 
A={0 0 1]. 
00 0 


(Sois ( : z ) , but the larger example gives a little more of an idea of why 


it works.) If, indeed, it were true that A = B?, then (since A? = 0) it would 
follow that Bê = 0, and hence that B is nilpotent. A nilpotent matrix of 
size 3 x 3 must have index less than or equal to 3 (since the index is always 
less than or equal to the dimension)—and that implies B? = 0, and since 
B* = A? + 0, that is a contradiction. 

What’s wrong? The answer is 0. People familiar with the theory of mul- 
tivalued analytic functions know that the point z = 0 is one at which the 
function defined by yz misbehaves; the better part of valor dictates that in 
the study of square roots anything like 0 should be avoided. What in ma- 
trix theory is “anything like 0”? Reasonable looking answer: matrices that 
have 0 in their spectrum. How are they to be avoided? Answer: by sticking 
to invertible matrices. Very well then: does every invertible matrix have a 
square root? 

Here is where the Jordan form can be used to good advantage. Every 
invertible matrix is similar to a direct sum of matrices such as 


with A Z 0, and, consequently, it is sufficient to decide whether or not every 
matrix of that form has a square root. 
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The computations are somewhat easier in case = 1, and it is possible 
to reduce to that case simply by dividing by A. When that is done, the 1’s 
above the diagonal turn into ls, to be sure, but in that position they cause 
no trouble. So the problem is to find a square root for something like 


la 0 0 
0.1 oa 0 
0.0 1 « 
00 0 1 


One way to do that is to look for a square root of the form 


1&7 ¢ 
01i ¿én 
001€ 
000 1 
Set the square of that matrix equal to the given one and look for solutions 
z, y, z of the resulting equations. That works! 

There is a more sophisticated approach. Think of the given matrix as 
I + M, where 


0 a 0 0 
00 a0 
a 00 0 a 
0000 
The reason that's convenient is that it makes possible the application of 


facts about the function 4/1 + C. 

As is well known, Professor Moriarty “wrote a treatise upon the bino- 
mial theorem, which has had a European vogue"; the theorem asserts that 
the power series expansion of (1 + C)$, is 


a«gt-is (f) (i)e (5)e 


(Here a binomial coefficient such as, for instance, (5) denotes 


&(€ — 1)(£ - 2) 
3l, 


and the parameter £ can be any real number.) The series converges for 
some values of ¢ and does not converge for others, but, for the moment, 
none of that matters. What does matter is that the equation is "formally" 
right. That means, for instance, that if the series for £ — 1 is multiplied 
by itself, then the constant term and the coefficient of ¢ turn out to be 1 
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and all other coefficients turn out to be 0—the product is exactly 1 + C. 
In the application that is about to be made the variable ¢ will be replaced 
by a nilpotent matrix, so that only a finite number of non-zero terms will 
appear—and in that case convergence is not a worry. 

All right: consider the series with k — 1, and replace the variable ¢ by 
the matrix M. The result is 


1 2 o? o? 
2 4 16 
2 
a a 
1 E LI 
0 2 4 
a 
0 1 = 
0 2 
0 0 0 1 


(check?), and that works—meaning that its square is 1 + M (check?). So, 
one way or another, it is indeed true that every invertible matrix has a 
square root. 


Solution 115. 


The differentiation transformation D is nilpotent of index 4 (the dimen- 
sion of the space). Consequence: both the minimal polynomial and the 
characteristic polynomial are equal to A+. 

As for T, its only eigenvalue is 1. Indeed: if 


a+ B(t+1)+7(t+ 1)? + êl +1)? = Ala + Bt + yt? + ôt?) 


then 
a+ß+ y+ =a, 
B +2y 4-36 = AB, 
y T 362 Ay, 
6 — A6. 


It follows that if À # 1, then 


and therefore 
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On the other hand if À = 1, then 


B+ y+ 6-0, 
2y +36 = 0, 
36 = 0, 
and therefore 
6=7y=6=0. 


(Another way to get here is to look at the matrix in Solution 108.) Conclu- 
sion: both the minimal polynomial and the characteristic polynomial are 
(A— 1). 


Solution 116. 


Yes, it's always true that one polynomial can do on each of n prescribed 
transformations what n prescribed polynomials do. The case n — 2 is typ- 
ical and notationally much less cumbersome; here is how it goes. 

Given: two linear transformations A and B with disjoint spectra, and 
two polynomials p and q. Wanted: a polynomial r such that 


r(A) = p(A) 
and 
r(B) = q(B). 


If there is such a polynomial r, then the difference r — p annihilates 
A. The full annihilator of A, that is the set of all polynomials f such that 
f(A) = 0, is an ideal in the ring of all complex polynomials; every such 
polynomial is a multiple of the minimal polynomial pp of A. Consequence: 
if there is an r of the kind sought, then 


r= Spo +p 


for some polynomial p, and, similarly, 


r= tqo t 4, 


where q is the minimal polynomial of B. Conversely, clearly, any spo + p 
maps A onto p( A), and any tqo + q maps B onto q(B); the problem is to 
find an r that is simultaneously an spo + p and a tqo + q. In other words, 
the problem is to find polynomials s and t such that 


Spo — tqo = q — p. 
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Since po and qo are relatively prime (this is the step that uses the assumed 
disjointness of the spectra of A and B), it is a standard consequence of the 
Euclidean algorithm that such polynomials s and ¢ do exist. 

The general case (n > 2) can be treated either by imitating the special 
case or else by induction. Here is how the induction argument goes. 

Assume the conclusion for n, and pass to n + 1 as follows. By the in- 
duction hypothesis, there is a polynomial p such that 

»(A5) = p; (4) 
for j 2 1,...,n. Write 
A = Ay e oie, e An 
(direct sum), 
B- Anti) 
and 
q = Dj. 


Note that the spectra of A and B are disjoint (because the spectrum of A 
is the union of the spectra of the A;'s, j = 1,..., n), and therefore, by the 
case n — 2 of the theorem, there exists a polynomial r such that 


r(A) = p(A) 
and 

r(B) = q(B). 
Once the notation is unwound, these equations become 

r(A1i) Qe. r(A«) = pn(A1) Qe ® ps (An) 
and 
(Anti) = Pati (Ana). 
The first of these equations implies that 
r(A;) = »;(A) 


for j — 1,...,n, and that concludes the proof. 

The result holds for all fields, not only C, provided that the hypothesis 
of the disjointness of spectra is replaced by its algebraically more usable 
version, namely the pairwise relative primeness of the minimal polynomi- 
als of the given transformations. 
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Chapter 8. Inner Product Spaces 


Solution 117. 


An orthogonal set of non-zero vectors is always linearly independent. (The 
case in which one of them is zero is degenerate—then, of course, they are 
dependent.) Indeed, if 


O47] +--+ + Antn = Q, 


form the inner product of both sides of the equation with any x; and get 
a;(x;,2;) =0. 


The reason is that if i A j, then the inner product (z;,2;) is 0; that's what 
the assumed orthogonality says. Since (x;,2;) Æ 0 (by the assumed non- 
zeroness), it follows that a; = O0—every linear dependence relation must be 
trivial. 


Solution 118. 


The answer to the question as posed is no: different inner products must yield 
different norms. The proof is a hard one to discover but a boring one to 
verify—the answer is implied by the equation 


2 2 2 


MO 1 
+i grt iy) —i gr — iy) 


, 


- [ae -» 


= [36-9 


which is called the polarization formula. It might be somewhat frighten- 
ing when first encountered, but it doesn't take long to understand, and once 
it's absorbed it is useful—it is worth remembering, or, at the very least, its 
existence is worth remembering. 


Solution 119. 
What is always true is that 
lla + yl? = lel? + (2,4) + Qr 2) + lig 
For real vector spaces the two cross product terms are equal; the equation 
lz + vl? = liz ll? + Iyl? 


is equivalent to (x, y) = 0, and all is well. 
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In complex vector spaces, however, (y, x) is the complex conjugate of 
(x,y); the sum of the two cross product terms is 2Re(z, y). The equation 
between norms is equivalent to Re(z, y) = 0, and that is not the same as 
orthogonality. An obvious way to try to construct a concrete counterexam- 
ple is to start with an arbitrary vector x and set y = iz. In that case 


llz + yl? = I0 + del? = 2l? = [lel]? + lly, 


but (except in the degenerate case x = 0) the vectors z and y are not 
orthogonal. 


Solution 120. 
Multiply out ||z + y||? + ||z — yll?, get 


lzel? + (x,y) + (y) + Ill? + Hell? — (m 9) — (v. 2) + llyll?, 


and conclude that the equation in the statement of the problem is in fact 
an identity, true for all vectors z and y in all inner product spaces. 


Solution 121. 


Yes, every inner product space of dimension n has an orthonormal set of 
n elements. Indeed consider, to begin with, an arbitrary orthonormal set. 
If no larger one jumps to the eye, a set with one element will do: take an 
arbitrary non-zero vector z, and normalize it (that is replace it by Tm If 
the orthonormal set on hand is not maximal, enlarge it, and if the resulting 
orthonormal set is still not maximal, enlarge it again, and proceed in this 
way by induction. Since an orthonormal set can contain at most n elements 
(Problem 111), this process leads to a complete orthonormal set in at most 
n steps. 

Assertion: such a set spans the whole space. Reason: if the set is 
(21,22, ...), and if some vector z is not a linear combination of the z;'s, 
then form the vector 


y-r-— Si, 25). 
j 
The assumption about x implies that y Æ 0. Since, moreover, 


(y, zi) => (z, zi) =; YG, zi)ói = (z, zi) x (x, 2) Y 0, 


J 


so that y is orthogonal to each of the z;'s, the normalized vector iP when 


adjoined to the z;'s, leads to a larger orthonormal set. That's a contradic- 
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tion, and, therefore, the z;’s do indeed span the space. Since they are also 
linearly independent (Problem 111 keeps coming up), it follows that they 
constitute a basis, and hence that there must be n of them. 


Comment. There is a different way to express the proof, a more construc- 

tive way. The idea is to start with a basis {z1,...,2,} and by continued 

modifications convert it to an orthonormal set. Here is an outline of how 
x 


that goes. Since zı Æ 0, it is possible to form y; = TTE Once yi, ..., Yr 
1 


have been found so that each y; is a linear combination of z,... , £j, form 


m 
Tr+1 — YO (ar41,95) Yi» 
j=l 
verify that it is linearly independent of y,,...,y,, and normalize it. These 
steps are known as the Gram-Schmidt orthogonalization process. 


Solution 122. 


If x and y are the same vector, then both sides of the Schwarz inequality 
are equal to ||z||?. More generally if one of z and y is a scalar multiple 
of the other (in that case there is no loss of generality in assuming that 
y = az), then both sides of the inequality are equal to |o] - ||z||?. If x 
and y are linearly dependent, then one of them is a scalar multiple of the 
other. In all these cases the Schwarz inequality becomes an equation—can 
the increasing generality of this sequence of statements be increased still 
further? The answer is no: the Schwarz inequality can become an equation 
for linearly dependent pairs of vectors only. 
One proof of that assertion is by black magic, as follows. If 


(x, ¥)| = Ilall - llil, 


replace x by yz, where y is a complex number of absolute value 1 chosen 
so that y(x, y) is real. The assumed "Schwarz equation" is still true, but 
with the new z (and the same old y) it takes the form 


(z,y) = Ilzll - lly- 


This is not an important step—it just makes the black magic that follows a 
tiny bit more mysterious still. Once that step has been taken, evaluate the 
expression 


2 
| lll — le | - 
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Since it is equal to 


(Ilvllz — Illy, Ilylle — llælly) = Hell lel? — 2HleIP Iul? + lell? = 0, 


it follows that ||y||z — ||z||y = 0, which is indeed a linear dependence 
between z and y. 

One reason why the Schwarz inequality is true, and why equality hap- 
pens only in the presence of linear dependence, can be seen by looking at 
simple special cases. Look, for instance, at two vectors in R?, say x = (a, 8) 
and y — (1,0). Then 


llzl| = Vlei? - 8I, — lll = 1, and — (z,y)—o; 


the Schwarz inequality reduces to the statement 


lo] S Vlal? + |8l?, 


which becomes an equation just when 6 = 0. 

An approach to the theorem that is neither black magic nor overly sim- 
plistic could go like this. Assume that |(z, y)| = ||z|| - ||y|| and, temporarily 
fixing a real parameter o, consider 


llz — ay||? = (z — ay, £ — ay) = IIl? — 2Re(z, ay) + la? Ill". 


This indicates why changing z so as to make (z, y) real is a helpful thing to 
do; if that's done, then the right term becomes 


(llzll — lel - [Iyll)?. 


Inspiration: choose the parameter a so as to make that term equal to 0 
(which explains the reason for writing down the black magic expression)— 
the possibility of such a choice proves that x — ay can be made equal to 0, 
which is a statement of linear dependence. 


Solution 123. 


If M is a subspace of a finite-dimensional inner product space V, then M 
and M+ are complements (Problem 28), and M++ = M. For the proof, 
consider an orthonormal basis (7, ..., £m } for the subspace M. If z is an 
arbitrary vector in V, form the vectors 


m 
r= SCIAT and y-—z-a. 
i=1 
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Since z is a linear combination of the z;'s, it belongs to M, and since y is 
orthogonal to each z; it belongs to M+. Consequence: 


V=M+M-. 


If a vector u belongs to both M and M+, then (u, u) = 0 (by the defini- 
tion of M+): that implies, of course, that u = 0, that is that the subspaces 
M and M+ are disjoint. Conclusion (in the language of Problem 50): V is 
the direct sum of M and M+, and that's as good a relation between M and 
MŁ as can be hoped for. 

The definitions of x and y imply that 


(2,2) = (z +y,z) = lel? + (9,2) = |lzll?, 


and, similarly, 


(z,y) = (e&+y,y) = (s.v) + Ilyll? = Ilyll?. 


It follows that if z is in M‘4, so that (z,y) = 0, then ||y||? = 0, so that 
z = xz and therefore z is in M; in other words M++ cC M. Since the reverse 
inclusion M C M++ is already known, it now follows that M = M++, and 
that’s as good a relation between M and M++ as can be hoped for. 


Solution 124. 


The answer is yes: every linear functional £ on an inner product space V is 
induced as an inner product. For the proof it is good to look at the vectors 
x for which £(z) = 0. If every z is like that, then € = 0, and there is nothing 
more to say. In any case, the kernel of £ is a subspace of V, and itis pertinent 
to consider its orthogonal complement, which it is convenient to denote by 
kert £. If ker £ # V (and that may now be assumed), then kert £ contains 
at least one non-zero vector yo. It is true in fact (even though for present 
purposes it is not strictly needed) that ker? £ consists of all scalar multiples 
of any such vector yo; in other words, the subspace ker. £ has dimension 
1. Indeed: if y is in kert £, then so is every vector of the form y — ayo. The 
value of £ at such a vector, that is 


Ely — ayo) = £(y) — a£ (yo), 


can be made equal to 0 by a suitable choice of the scalar o (namely, 


_ &y) ) 

~ Elo) 
which means, for that value of £, that y — £yo belongs to both ker € and 
kert £. Conclusion: y — ayo = 0, that is y = ayo. 
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The vector yo “works” just fine for the vectors in ker €, meaning that 
if z is in ker £, then 


E(x) = (z, Vo) 
(because both sides are equal to 0), and the same thing is true for every 
scalar multiple of yo. Does the vector yo work for the vectors in kert € 
also? That is: is it true for an arbitrary element ayo in kert £ that 
u(ogo) = (ayo, yo)? 


The equation is equivalent to 


£(vo) = |lyoll?, 


and there is no reason why that must be true, but, obviously, it can be true if 
yo is replaced by a suitable scalar multiple of itself. Indeed: if yo is replaced 
by yyo, the desired equation reduces to 

yElyo) = In? - Ilyoll?, 
which can be satisfied by choosing y so that 


£(yo) = 7Ilyoll?- 


Solution 125. 


(a) Since by the very definition of adjoints (U* 7, C) is always equal to (n, UC) 
the way to determine U* is to calculate with (U*n, C). That's not inspiring, 
but it is doable. The way to do it is to begin with 


(U* (z1, y) (22, y2)) 


and juggle till it becomes an inner product with the same second term 
(22, y2) and a pleasant, simple first term that does not explicitly involve 
U. The beginning is natural enough: 


(U* (21,1), (x2, y2)) = ((€1,m),U (x2, y2)) 


= ((21, 41); (ya, —£2)) = (21,92) — (91, 22). 


That's an inner product all right, but it is one whose second term is (y2, £2) 
instead of (x2, y). Easy to fix: 


(U* (zi, 91), (x2, 92)) = (—y1, 22) + (21, ye) = ((—y1, 21), (x2, y2))- 
That does it: the identity so derived implies that 


U*(z,y) = 7v 2), 
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and that’s the sort of thing that is wanted. What it shows is a surprise: it 
shows that 


U* = —U. 
The calculation of U*U is now trivial: 
U*U (z, y) =. U* (y, —z) = (x, y) 


so that U*U is equal to the identity transformation. The verification that 
UU* is the same thing is equally trivial. 

(b) Yes, a graph is always a subspace. The verification is direct: if 
(1, y1) and (zs, ya) are in the graph of A, so that 


yı = Az and Y2 = Axe, 


then o4 (zi, y1) + @2(£2, y2) is in the graph of A, because 


ayı + a2y2 = AX, + Azo. 


(c) The graph of A* is the orthogonal complement of the image under 
U of the graph of A. To prove that, note that the graph of A is the set of all 
pairs of the form (x, Az), and hence the U image of that graph is the set 
of all pairs of the form (— Az, x). The orthogonal complement (in V & V) 
of that image is the set of all those pairs (u, v) for which 


(—Az,u) + (x,v) 20 
identically in x. That means that 
(rz, —A'u 4 v) 20 


for all z, and hence that A*u — v. The set of all pairs (u, v) for which 
A*u = v is the set of all pairs of the form (u, A*u), and that’s just the 
graph of A*. 

That wasn't bad to verify—was it?—but how could it have been dis- 
covered? That sort of question is always worth thinking about. 


Solution 126. 
(a) Yes, congruence is an equivalence relation. Indeed, clearly, 
A=P*AP (with P — 1); 
if B=P*AP, then A- Q'BQ (with Q — P^!) 
and 


if B=P*AP and C=Q*BQ, then C = R'AR (with R = PQ). 
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(b) Yes: if B = P* AP, then B* = P* A*P. 

(c) No: a transformation congruent to a scalar doesn’t have to be a 
scalar. Indeed, if P is an arbitrary invertible transformation such that P* P 
is not a scalar (such things abound), then P* P (= P* -1- P) is congruent 
to the scalar 1 without being equal to it. 

(d) The answer to this one is not obvious—some head scratching is 
needed. The correct answer is yes: it is possible for A and B to be congruent 
without A? and B? being congruent. Here is one example: 


0 1 0 1 
E Tr 


The computation is easy. If 


then 


p5-(2696 9-636 9-G 2 


so that, indeed, A is congruent to B. Since, however, 


0 0 0 
A-( 2) and B= (4 FIT 


it follows that A? cannot be congruent to B?. (Is a microsecond's thought 
necessary? Can the transformation 0 be congruent to a transformation that 
is not 0? No: since P*-0. P = 0, it follows that being congruent to 0 implies 
being equal to 0.) 


Solution 127. 


The desired statement is the converse of a trivial one: if A = 0, then 
(Az, x) = 0 for all x. In the non-trivial direction the corresponding state- 
ment about sesquilinear forms (in place of quadratic ones) is accessible: if 
(Az, y) = 0 for all x and y, then A = 0. Proof: set y = Az. A possible ap- 
proach to the quadratic result, therefore, is to reduce it to the sesquilinear 
one—try to prove that if (Az, x) = 0 for all z, then (Az, y) = 0 for all z 
and y. 

What is wanted is (or should be?) reminiscent of polarization (Solu- 
tion 118). What that formula does is express the natural sesquilinear form 
(x, y) in terms of the natural quadratic form ||z||?. Can that expression be 
generalized? Yes, it can, and the generalization is no more troublesome 
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than the original version. It looks like this: 
1 1 1 1 
(453) =A (See +a 56 99) - A (56-9 56-0) 
+iA (=æ à) S + iy) -i4 (6 9). EE- i) 
iA 5(7 iy) iy) | - iA 5(v iy) 5(z — iy ; 


Once that’s done, everything is done: if (Az, z) is identically 0, then so is 
(Az, y). 


Solution 128. 


The product of two Hermitian transformations is not always Hermitian— 
or, equivalently, the product of two conjugate symmetric matrices is not 
always conjugate symmetric. It is hard not to write down an example. Here 


7 o 369-63 


Does the order matter? Yes, it matters in the sense that if the same two 
matrices are multiplied in the other order, then they give a different an- 


7" o (23€23-02 


but the answer "no" does not change to the answer "yes". 
How likely is the product of two Hermitian transformations to be Her- 
mitian? If A and B are Hermitian, and if AB also is Hermitian, then 


(AB)* = AB, 


which implies that BA = AB. What this proves is that for the product of 
two Hermitian transformations to be Hermitian it is necessary that they 
commute. Is the condition sufficient also? Sure—just read the argument 
backward. 


Solution 129. 
(a) If B = P* AP and A* = —A, then 
B* = P*(-A)P = —P*AP = - B. 


Conclusion: a transformation congruent to a skew one is skew itself. 
(b) If A* = —A, then 


(A?)* = (4? = (—A)? = A’, 
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which is not necessarily the same as — A?. Conclusion: the square of a skew 
transformation doesn’t have to be skew. Sermon: this is an incomplete 
proof. For perfect honesty it should be accompanied by a concrete exam- 
ple of a skew transformation A such that A? 4 —A?. One of the simplest 
such transformations is given by the matrix 


(^ 0) 


As for A2, since (—1)? = —1, it follows that A* = —A implies (A*)* = 
— A3, so that A* is skew along with A. 
(c) Write 


S (forsum) = AB+ BA 
and 
D (for difference) = AB — BA. 
The question is: what happens to 
S* = B* A* + A*B* 
and 
D* = B* A* — A*B* 


when A* and B* are replaced by A and B, possibly with changes of sign? 
The answer is that if the number of sign changes is even (0 or 2), then S 
remains Hermitian and D remains skew, but if the number of sign changes 
is odd (which has to mean 1), then S becomes skew and D becomes Her- 
mitian. 


Solution 130. 
If A — A*, then 
(Az, x) = (x, A*z) = (x, Az) = (Az, x), 


so that ( Az, x) is equal to its own conjugate and is therefore real. If, con- 
versely, (Az, x) is always real, then 


(Az, x) = (Az, x) = (x, A2) = (A*zx, £), 
so that ((A — A*)z, x) = 0 for all z, and, by Problem 127, A = A*. 
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Solution 131. 


(a) The entries not on the main diagonal influence positiveness less than 
the ones on it. So, for example, from the known positiveness of 


2 1 
1 1 
it is easy to infer the positiveness of 
2 -1 
-1 1J' 
2 2 

(b) Yes, and an example has already been seen, namely E 1 ). 
(c) A careful look at 


shows that the quadratic form associated with it is 


i£ + £2 + é3|?, 


and that answers the question: yes, the matrix is positive. 
(d) The quadratic form associated with 
10 1 
0 1 0 
101 


lf: + &3[? + eal, 


and that settles the matter; yes, the matrix is positive. 
(e) The quadratic form associated with 


a 1 1 
1 0 0 
1 0 0 


al,|? + 2Re£ £2 + 2Re £s 


and the more one looks at that, the less positive it looks. It doesn’t really 
matter what £5 is—it will do no harm to set it equal to 0. The enemy is the 
coefficient o, and it can be conquered. No matter what a is, choose £; to 
be 1, and then choose £» to be a gigantic negative number—the resulting 
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value of the quadratic form will be negative. The answer to the question as 
posed is: none, 


Solution 132. 


Yes, if a positive transformation is invertible, then its inverse also is posi- 
tive. The proof takes one line, but a trick has to be thought of. 
How does it follow from (Az, x) 2 0 for all x that (A^!y, y) 2 0 for 
all y? Answer: put y = Az. Indeed, then 
(A71y, y) = (A71 Az, Az) = (x, Ax) = (Az, x), 


and the proof is complete. 
(Is the reason for the last equality sign clear? Since A~! is positive, 
A^! is Hermitian, and therefore A is Hermitian.) 


Solution 133. 


If E is the perpendicular projection onto M, so that E is the projection 
onto M along M+, then Problem 82 implies that E* is the perpendicular 
projection onto (M+)+ along M+. (Problem 82 talks about annihilators in- 
stead of orthogonal complements, but the two languages can be translated 
back and forth mechanically.) That means that E* is the perpendicular 
projection onto M (along M+)—and that is exactly E. 

If, conversely, E — E? — E*, then the idempotence of E guarantees 
that E is the projection onto ran E along ker E (Problem 72). If x is in 
ran E and y is in ker E, then 


(x,y) = (x,y) (because the vectors in the range of a 
projection are fixed points of it—see Problem 72) 
= (r,E*y) (just by the definition of adjoints) 
= (x, Ey) (because E was assumed to be Hermitian) 


=0 (because y is in ker E). 


Consequence: ran E and ker E are not only complements—they are or- 
thogonal complements, and, therefore, E is a perpendicular projection. 


Summary. Perpendicular projections are exactly those linear transforma- 
tions that are both Hermitian and idempotent. 
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Solution 134. 


Since a perpendicular projection is Hermitian, the matrix of a projection 
on C? must always look like 
(8 2) 
B y)’ 


where a and y must be real, and must, in fact, be in the unit interval. 
(Why?) The question then is just this: which of the matrices that look like 
that are idempotent? 

To get the answer, compute. If 


(s NG Des um 3(4 6) 
B. S 1 aB+By |B? +7? By}? 
then (top right corner) a + y = 1, so that 


y=1-a. 


Consequence (lower right corner): |8|? + (1—a)* = 1—a, which simplifies 
to 


I8? = a(1 — o). 
Conclusion: the matrices of projections on C? are exactly the ones of the 
form 
a 04/a(1 — a) 
86 /a(1-— a) l-a 4 
where 


O<aS1 and  ||-1. 


Comment. The case 8 = 0 seems to be more important than any other; 
in any event it is the one we are most likely to bump into. 


Solution 135. 


If E and F are projections, with ran E = M and ran F = N, then the 
Statements 


ESF and MCN 


are equivalent. 
Suppose, indeed, that E € F. If z is in M, then 


(Fz, z) z (Ez, z) Tz (z, z), 
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Since the reverse inequality 
(x, 2) 2 (Fe, 2) 
is always true, it follows that 
((1— F)z, x) — 0, 
and hence that 
l1 — Fl = o. 


(Why is the last “hence” true?) Conclusion: Fx = zx, so that x is in M. 

If, conversely, M C N, the F Ez = Ez (because Ez is in M for all x), 
so that FE = E. It follows (from adjoints) that EF = E, and that justifies 
a small computational trick: 


(Ez, x) = ||Ez||? = ||EFz|? S ||Fz||? = (Fe, 2). 
Conclusion: E € F. 


Solution 136. 


If E and F are projections with ran E = M and ran F = N, then the 
Statements 


MLN and EF =0 


are equivalent. 
Suppose indeed that EF = 0. If x is in M and y is in N, then 


(x,y) = (Ez, Fy) = (x, E* Fy) = (x, EFy) = 0, 
If, conversely, M L N, so that 
NCM!, 


then, since F'z is in N for all z, it follows that Fz is in M+ for all z. Con- 
clusion: EFz = 0 for all x. 


Solution 137. 


If A is Hermitian, and if x is a non-zero vector such that Az = Az, then, 
of course, 


(Az, x) = A(z, x); 


since (Az, z) is real (Problem 130), it follows that A is real. If, in addition, 
A is positive, so that (Az, x) is positive, then it follows that A is positive. 
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Note that these conditions on the eigenvalues of Hermitian and posi- 
tive transformations are necessary, but nowhere near sufficient. 


Solution 138. 


The answer is that for Hermitian transformations eigenvectors belonging 
to distinct eigenvalues must be orthogonal. 
Suppose, indeed, that 


Az, = 121 and Azo = Ag29, 
with A; Æ Ag. If A is Hermitian, then 
A1(21, 29) = (Ax1, £2) = (21, Ara) (why?) 
= Ao(z1, 22) (why?). 
Since ^, Æ A», it must follow that (z1, £2) = 0. 


Comment. Since the product of the eigenvalues of a transformation on 
a finite-dimensional complex vector space is equal to its determinant (re- 
member triangularization), these results imply that the determinant of a 
Hermitian transformation is real. Is there an obvious other way to get the 
same result? 


Chapter 9. Normality 


Solution 139. 


Caution: the answer depends on whether the underlying vector space is of 
finite or infinite dimension. 

For finite-dimensional spaces the answer is yes. Indeed, if U*U — 1, 
then U must be injective, and therefore surjective (Problem 66), and there- 
fore invertible (definition), and once that's known the equation U*U — 1 
can be multiplied by U^! on the right to get U* = U^. 

For infinite-dimensional spaces the answer may be no. Consider, in- 
deed, the set V of all finitely non-zero infinite sequences 


{&1, £2, &3, aa J 


of complex numbers. The phrase "finitely non-zero" means that each se- 
quence has only a finite number of non-zero terms (though that finite 
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number might vary from sequence to sequence). With the obvious way of 
adding sequences and multiplying them by complex scalars, V is a complex 
vector space. With the definition 


oo 
(£&, £5, £s, E -J (m.m. 73... }) = 2» Enn: 
n=1 
the space V becomes an inner product space. If U and W are defined by 


U(&, £2, 6. aah = {0, £1, E2, £5, .. 3 


and 


Wt, £2, £5, mE J = (£2, &3, 4, mE J 


then U and W are linear transformations on V, and a simple computation 
establishes that the equation 


(Uz, y) = (x, Wy) 


is true for every pair of vectors z and y in V. In other words W is exactly 
the adjoint U* of U, and, as another, even simpler, computation shows 


U*U — 1. 


(Caution: it is essential to keep in mind that when U*U is applied to a 
vector z, the transformation U is applied first.) It is, however, not true that 
UU* = 1. Not only does U* fail to be the inverse of U, but in fact U has 
no inverse at all. The range of U contains only those vectors whose first 
coordinate is 0, so that the range of U is not the entire space V—that's 
what rules out invertibility. 


Solution 140. 


When is um 
y 6 


if and only if the product of 


G5) om (G1) 


is the identity matrix. Since 


(e T d M PS 
y 6 y 67] \ oB+ 76 |B)? Jo 7’ 


that condition says that 


) the matrix of a unitary transformation on C?? Answer: 


la? + |7|? 218? +|5?=1 and a6+76=0, 
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or, in other words that the vectors (a, 8) and (7,6) in C? constitute an 
orthonormal set. 

This 2 x 2 calculation extends to the general case. If U is a linear trans- 
formation on a finite-dimensional inner product space, and if the matrix 
of U with respect to an orthonormal basis is (u;;), then a necessary and 
sufficient condition that U be unitary is that 


J UkiUkj = i. 
k 


That matrix equation is, in fact, just the equation U*U — 1 in matrix nota- 
tion. 

These comments make it easy to answer the questions about the spe- 
cial matrices in (a), (b), and (c). 

For (a): since the second row is not a unit vector, it doesn't matter what 
a is, the matrix can never be unitary. 

For (b): the rows must be orthonormal unit vectors. Since the norm of 
each row is |a|? + 1, the condition of normality is equivalent to |a] = v3 
Since the inner product of the two rows is 1(—o + a), their orthogonality 
is equivalent to o being real. Conclusion: 


is unitary if and only if a = +¥3. 

For (c): the question is an awkward way of asking whether or not a 
multiple of (1, 1, 1) can be the first term of an orthonormal set. The answer 
is: why not? In detail: if w is a complex cube root of 1, then the vectors 


(1, 1, 1), (1,0, u?), and (1, w?,w) 


all have the same norm (8); normalization yields an explicit answer to 
the question. 


Solution 141. 


None of the three conditions U* = U, U*U = 1, and U? = 1 implies any 
of the others. Indeed, 

1 0 

0 2 


is Hermitian but neither unitary nor involutory; 


(^ o) 
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is unitary but neither Hermitian nor involutory; and 


1 -2 
(0 7) 
is involutory but neither Hermitian nor unitary. 

The implicative power of pairs of these conditions is much greater than 
that of each single one; indeed it turns out that any two together imply the 
third. That’s very easy; here is how it goes. 

If U* = U, then the factor U* in U*U can be replaced by U, and, 
consequently, U*U = 1 implies U? = 1. 

If U*U = 1 and U? = 1, then of course U*U = U?; multiply by U-! 
(= U*) on the right and get U* = U. 

If, finally, U* = U, then one of the factors U in U? can be replaced by 
U*, and consequently, U? = 1 implies U*U = 1. 


Solution 142. 


Each row and each column of a unitary matrix is a unit vector. If, in par- 
ticular, a unitary matrix is triangular (upper triangular, say), then its first 
column is of the form 


(*,0,0,0,...), 


and, consequently, those entries in the first row that come after the first 
can contribute nothing—they must all be 0. Proceed inductively: now it’s 
known that the second column is of the form 


(0, *,0,0,...), 


and, consequently, those entries in the second row that come after the first 
two can contribute nothing—etc., etc. Conclusion: a triangular unitary ma- 
trix must be diagonal. 


Comment. This solution tacitly assumed that the matrices in question 
correspond to unitary transformations via orthonormal bases. A similar 
comment applies in the next problem, about Hermitian diagonalizability. 


Solution 143. 


The answer is yes; every Hermitian matrix is unitarily similar to a diago- 
nal one. This result is one of the cornerstones of linear algebra (or, per- 
haps more modestly, of the part of linear algebra known as unitary geome- 
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try). Its proof is sometimes considered recondite, but with the tools already 
available here it is easy. 

Suppose, indeed, that A is a Hermitian transformation with the dis- 
tinct eigenvalues 


Ady ese An 
and corresponding eigenspaces M;: 
M; = (zx: Ax = Aiz}, 
i=1,...,r.Ift Æ j (so that A; Z A;), then 
M; L Mj 


(by Problem 138). The M;'s must span the entire space. Reason: the re- 
striction of A to the orthogonal complement of their span is still a Her- 
mitian transformation and, as such, has eigenvalues and corresponding 
eigenspaces. 

That settles everything. Just choose an orthonormal basis within each 
ML, and note that the union of all those little bases is an orthonormal basis 
for the whole space. Otherwise said: there exists an orthonormal basis 


TX1,.: Tn 


of eigenvectors; the matrix of A with respect to that basis is diagonal. 


Solution 144. 


The answer is 1: every positive transformation has a unique positive square 
root. A quick proof of existence goes via diagonalization. If A = 0, then, 
in particular, A is Hermitian, and, consequently, A can be represented by 
a diagonal matrix such as 


oom? 
[e o 
200 


The diagonal entries a, 8, y, .. . are the eigenvalues of A, and, therefore, 
they are real; since, moreover, A is positive, it follows that they are positive. 
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Write 


(where the indicated numerical square roots are the positive ones), and 
jump happily to the conclusions that (i) B 2 0 and (ii) B? = A. 
What about uniqueness? If C 2 0 and C? — A, then C can be diago- 
nalized, 
£ 0 0 
07 0 
C=lo 0 ¢ 
The numbers £,7,¢,... are positive and their squares are the numbers 
a, B, y, . ..— Q.E.D. 


Solution 145. 


Every linear transformation A on a finite-dimensional inner product space 
is representable as 


A-UP 


with U unitary and P positive; if A is invertible, the representation is unique 

To get a clue to a way of constructing U and P when only A is known, 
think backward: assume the result and try to let it suggest the method. If 
A = UP, then A* = PU", and therefore 


A*A- P. 
That's a big hint: since A* A and P? are positive linear transformations, 


they have positive square roots; the equation P? — A* A implies the square 
root equation 


P = V A*A. 


That’s enough of a hint: given A, define P by the preceding equation, and 
then ask where U can come from. If A is to be equal to UP, then it’s tempt- 
ing to “divide through” by P—which would make sense if P were invert- 
ible. All right: assume for a moment that A is invertible; in that case A* is 
invertible, and so are A* A and P. If U is defined by 


U = AP, 
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then 
U*U = P !A*AP- P 1P?pP— 1, 


and victory has been achieved: U is indeed unitary. 
Uniqueness is not hard. If 


Ui P, = UzP», 
with U and U2 unitary and P; and P, invertible and positive, then 
P? = (U, P)' (ULP) = (U2P3)' (U2P») = P2, 
and therefore (by the uniqueness of positive square roots) 
P, = P. 


“Divide” the equation U; P, = U2P2 through by P; (= P5) and conclude 
that Ui = Uz. 

If A is not invertible, the argument becomes a little more fussy. What 
is wanted is Az = U Prz for all x, or, writing y = Px, what is wanted is 


Uy = Px 


whenever y is in the range of P. Can that equation be used as a definition 
of U—is it an unambiguous definition? That is: if one and the same y is in 
the range of P for two reasons, 


y= Pz and y = Pro, 

must it then be true that Az, = Az35? The answer is yes: write 
T= T1 — T2 

and note the identity 

||Px||? = (Pz, Px) = (P?z, z) = (A* Az, £) = || Ax|]?. 

It implies that if Px = 0, then Az = 0, or, in other words, that if 
Px, = Pro, 

then 
Az, = Azo; 


the proposed definition of U is indeed unambiguous. 
Trouble: the proposed definition works on the range of P only; it de- 
fines a linear transformation U with domain equal to ran P and range equal 
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to ran A. Since that linear transformation preserves lengths (and there- 
fore distances), it follows that ran A and ran P have the same dimension. 
Consequence: rant A and rant P have the same dimension, and, con- 
sequently, there exists a linear transformation V that maps ran! P onto 
rant A and that preserves lengths. Extend the transformation U (already 
defined on ran P) to the entire space by defining it to be equal to V on 
rant P. The enlarged U has the property that ||Uz|| = ||z|| for all z, which 
implies that it is unitary; since A = UP, everything falls into place. 

In the non-invertible case there is no hope of uniqueness and the arbi- 
trariness of the definition of U used in the proof shows why. For a concrete 
counterexample consider 

0 1 
a= (o i) 


a=(3 o) (o o) 
DIC 


are polar decompositions of A. 


both the equations 


and 


Solution 146. 


Yes, eigenvectors belonging to distinct eigenvalues of a normal transfor- 
mation (on a finite-dimensional inner product space) must be orthogonal. 
The natural way to try to prove that is to imitate the proof that worked for 
Hermitian (and unitary) transformations. That is: assume that 


Az = 121 and Aro = À212, 


with A, Æ A», and look at 


(Az1,22) = (21, A* 22). 


The left term is equal to A; (z1, x2)—so far, so good—but there isn't any 
grip on the right term. Or is there? Is there a connection between the eigen- 
values of a normal transformation and its adjoint? That is: granted that 
Az = Az, can something intelligent be said about A*z? Yes, but it's a bit 


tricky. 
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The normality of A implies that 
||Aa||? = (Az, Az) = (A* Az, x) 
= (AA*z, £) = (A*z, A* £) = ||A*2||?. 
Since A — A is just as normal as A, and since 
(A-A) = A* - X, 
it follows that 
||(A — A) = ||(A* — Xall. 


Consequence: if A is an eigenvalue of A with eigenvector x, then À is an 
eigenvalue of A* with the same eigenvector z. 

The imitation of the proof that worked in the Hermitian case can now 
be comfortably resumed: since 


(Az1, £2) = A1 (21, £2) 


and 


(21, A*x2) = Ao(21, 72), 


the distinctness of A; and Àz implies the vanishing of (x1, £2), and the proof 
is complete. 


Solution 147. 


The answer is yes—normal transformations are diagonalizable. The key 
preliminary question is whether or not every restriction of a normal trans- 
formation is normal. That is: if A is normal on V, if M is a subspace of V, 
and if Ay is the restriction A|M of A to M (which means that Amz = Az 
whenever z is in M), does it follow that Am is normal? The trouble with 
the question is that it doesn't quite make sense—and it doesn't quite make 
sense even for Hermitian transformations. The reason is that the restric- 
tion is rigorously defined, but it may not be a linear transformation on M.— 
that is, it may fail to send vectors in M to vectors in M. For the question 
to make sense it must be assumed that the subspace is invariant under the 
transformation. All right, what if that is assumed? 

One good way to learn the answer is to write the transformation A 
under consideration as a 2 x 2 matrix according to the decomposition of 
the space into M and M+. The result looks like 


P x 
4-6 i) 
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where P is the linear transformation Am on M and the asterisks are linear 
transformations from M to M+ (top right corner) and from M+ to M+ 
(bottom right corner). It doesn’t matter what linear transformations they 
are, and there is no point in spending time inventing a notation for them— 
what is important is the 0 in the lower left corner. The reason for that 0 is 
the assumed invariance of M under A. 

Once such a matrix representation is known for A, one for A* can be 


deduced: 
At= f A . 
* * 


Now use the normality of A in an easy computation: since 
Ata= (7 >) F A a 
* x 0 s * * 
Ae - (5 *) b J A 
0 s * * * * 


normality implies that 


and 


P*P = PP*, 


that is, it implies that P is normal—in other words Ay is normal. 

That's all the hard work that has to be done—at this point the diago- 
nalizability theorem for normal transformations can be abandoned in good 
conscience. The point is that intellectually the proof resembles the one for 
Hermitian transformations in every detail. There might be some virtue in 
checking the technical details, and the ambitious reader is encouraged to 
do so—examine the proof of diagonalizability for Hermitian transforma- 
tions, replace the word “Hermitian” by “normal”, delete all references to 
reality, and insist that the action take place on a complex inner product 
space, and note, happily, that the remaining parts of the proof remain un- 
changed. 


Language. The diagonalizability of normal (and, in particular, Hermi- 
tian) transformations is sometimes called the spectral theorem. 


Solution 148. 
If A and B are defined on C? by 


0 1 1 0 
A- (1 a) and B~(; 3» 


320 LINEAR ALGEBRA PROBLEM BOOK 


then B is normal and every eigenspace of A is invariant under B, but A 
and B do not commute. 

If, however, A is normal, and every eigenspace of A is invariant under 
B, then Aand B docommute. The most obvious approach to the proof is to 
use the spectral theorem (Problem 147); the main purpose of that theorem 
is, after all, to describe the relation between a normal transformation and 
its eigenspaces. The assertion of the theorem can be formulated this way: 
if A is normal with distinct eigenvalues \,,..., Àr, and if E; is, for each 
j, the (perpendicular) projection on the eigenspace corresponding to A;, 
then A = 5°, A;E;. The assumption that the eigenspace corresponding to 
A; is invariant under B can be expressed in terms of E; as the equation 


BE; = E;BE;. 


From the assumption that every eigenspace of A is invariant under B it 
follows that the orthogonal complement of the eigenspace corresponding 
to A; is invariant under B (because it is spanned by the other eigenspaces), 
and hence that 


B(1— E;) = (1— E;)B(1— E;). 
The two equations together simplify to 
BE; — E;B, 


and that, in turn implies the desired commutativity BA — AB. 


Solution 149. 


There are three ways for two of three prescribed linear transformations A, 
B, and C to be adjoints of one another; the adjoint pairs can be (A, B), or 
(B, C), or (A, C). There are, therefore, except for notational differences, 
just three possible commutativity hypotheses: 


A with A* and A* with C, 
A with B and B with B*, 
A wih B and B with A*. 


The questioned conclusion from the last of these is obviously false; for 
a counterexample choose A so that it is not normal and choose B = 0. 
The implications associated with the first two differ from one another in 
notation only; both say that if something commutes with a normal transfor- 
mation, then it commutes with the adjoint of that normal transformation. 
That implication is true. 
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The simplest proof uses the fact that if A is normal, then a necessary 
and sufficient condition for AB = BA is that each of the eigenspaces of 
A is invariant under B (see Solution 148). Consequence: if A is normal 
and AB = BA, then the eigenspaces of A are invariant under B. The 
normality of A implies that the eigenspaces of A are exactly the same as 
the eigenspaces of A*. Consequence: the eigenspaces of A* are invariant 
under B. Conclusion: A* B = BA*, and the proof is complete. 


Solution 150. 


Almost every known proof of the adjoint commutativity theorem (Solution 
143) can be modified to yield the intertwining generalization: it is indeed 
true that if A and B are normal and AS = SB, then A*S = S B*. Alter- 
natively, there is a neat derivation, via matrices whose entries are linear 
transformations, of the intertwining version from the commutative one. 


Write 
A 0 0 S 
^ ^ 
2d -( A ang s =( 2) 


The transformation A^ is normal, and a straightforward verification proves 
that B^ commutes with it. The adjoint commutativity theorem implies that 
B^ commutes with A^* also. To get the desired conclusion from this fact, 
just multiply the matrices A^* and B^ in both orders and compare corre- 
sponding entries. 


Solution 151. 


Yes; if A, B, and AB are all normal, then BA is normal too. One good way 
to prove that statement is a splendid illustration of what is called a trace 
argument. In general terms, a trace argument can sometimes be used to 
prove an equation between linear transformations, or, what comes to the 
same thing, to prove that some linear transformation C is equal to 0, by 
proving that the trace of C*C is 0. Since C*C is positive, the only way it 
can have trace 0 is to be 0, and once C*C is known to be 0 it is immediate 
that C itself must be 0. The main techniques available to prove that the 
trace of something is 0 are the additivity of trace, 


tr(X - Y) 2 tr X +try, 


and the invariance of the trace of a product under cyclic permutations of 
its factors, 


tr(XY Z) = tr(ZXY). 
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If it could be proved that A and B must commute, then all would 
be well (see the discussion preceding the statement of the problem), but 
that is not necessarily true (see the discussion preceding the statement 
of the problem). A step in the direction of commutativity can be taken 
anyway: the assumptions do imply that B commutes with A* A. That is: if 
C = BA* A— A* AB, then C = 0. That's where the trace argument comes 
in. 

A good way to study C*C is to multiply out 


(A* AB* — B* A' A(BA* A — A* AB), 
getting 
A* AB' BA* A — B* A' ABA* A — A* AB' A AB + B' A*AA* AB, 


and then examine each of the four terms. As a device in that examination, 
introduce an ad hoc equivalence relation, indicated by X — Y for any two 
products X and Y, if they can be obtained from one another by a cyclic 
permutation of factors. A curious thing happens: the assumptions (A, B, 
and AB are normal) and the cyclic permutation property of trace imply 
that all four terms are equivalent to one another. Indeed: 


A* AB* BA* A= A*ABB*A* A (because B is normal) 
~ A*B*A*ABA (because AB is normal), 
B* A* ABA*A= B*A* ABAA* (because A is normal) 
~ A* B* A' ABA, 
A* AB* A* AB = AA*B* A' AB (because A is normal) 
~ A* B' A ABA, 
B* A* AA* AB ~ AA* ABB* A* 
= AA*B'A'AB (because AB is normal) 
~ A* B* A ABA. 
Consequence: all four terms have the same trace, and, therefore, the trace 
of C*C is 0. 
The result of the preceding paragraph implies that B commutes with 
A* A. If A = UP is the polar decomposition of A, then U commutes with P 


(because A is normal), and, since B commutes with P? (= A* A), it follows 
that B also commutes with P. These commutativities imply that 


U*(AB)U = U*(UP)BU = (U*U)(BP)U = B(UP) = BA. 
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Conclusion: BA is unitarily equivalent to the normal transformation AB, 
and, consequently, BA itself must be normal. 


Solution 152. 


(a) The adjoint of a matrix is its conjugate transpose. Polynomials are not 
clever enough to transpose matrices. If, for instance, 


0 0 
dedo) 
then every polynomial in A is of the form 
a 0 
b aj’ 
which has no chance of being equal to 
. /01 
due (5 " ` 
Question. What made this A work? Would any non-normal A work just 
as well? 

(b) This time the answer is yes; the inverse of an invertible matrix A 
can always be obtained via a polynomial. For the proof, consider the char- 
acteristic polynomial 

A? as a4A771 +++ E 01À + a0 


of A and observe that ag cannot be 0. Reason: the assumed invertibility of A 
implies that 0 is not an eigenvalue. Multiply the Hamilton-Cayley equation 


A" ra, 4,477! +- ca4A 4 a9 — O0 
by A^! to get 
A77! 4 a4 1477? +. +a) ag A^! — Q. 
Conclusion: if 
p(A) = ET (APT! + a4 177? ai), 


then p(A) = Aq. 


Solution 153. 


The answer is that all positive matrices are Gramians. Suppose, indeed, 
that A 2 0 and infer (Problem 144) that there exists a positive matrix B 
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such that B? = A. If A = (a;;), then the equations 
Qij = (Ae;, ei) (why?) 
= (B*e;, ei) = (Be;, Bei) 


imply that A is a Gramian (the Gramian of the vectors Be, Beg, ...), and 
that’s all there is to it. 


Solution 154. 


Squaring is not monotone; a simple counterexample is given by the matri- 


ces 
1 0 2 1 
f(g), cir AL) 


The relation A < B can be verified by inspection. Since 


na 0) A) and Faf 3 


2 2 (43 
se (11) 


it is also easy to see that the relation A? < B? is false; indeed, the deter- 
minant of B? — A? is negative. 

Is it a small blemish that not both the matrices in this example are 
invertible? That's easy to cure (at the cost of an additional small amount 
of computation): the matrices A+ 1 and B+ 1 are also a counterexample. 

That's the bad news; for square roots the news is good. That is: if 


so that 


O0SASB, 
then it is true that 


VAS VB. 


Various proofs of that conclusion can be constructed, but none of them 
jumps to the eye—the only way to go is by honest toil. The idea of one proof 
is to show that every eigenvalue of VB — VA is non-negative; for invertible 
Hermitian transformations that property is equivalent to positiveness. All 
right: suppose then that A is an eigenvalue of /B— V/A, with corresponding 
(non-zero) eigenvector z, so that 


VAz = V Bz — Az; 


it is to be shown that A 2 0. 
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If it happens that /Bz = 0, then, of course, Br = 0, and therefore 
it follows from the assumed relation between A and B that (Az, x) = 0. 
Consequence: Az = 0. Reason: 


0 = (Az, x) = (WAV Az, 2) = (VAr, VAz) = || V Az? 


Once that's known, then the assumed eigenvalue equation implies that 
Az — 0, and hence that A — 0. 

If V Bz 7 0, then (VBz,x) # 0—to see that apply the chain of equa- 
tions displayed just above to B instead of A. Consequence: 


(VBz, v Bz) = ||V Bz||? 
> ||VBz||-||VAz]| | (why?) 
> (VBz,VAz) (why?) 
= (VBz, V Bz — Ax) 
= (VBz, V Bz) — A(V Ba, x). 


Conclusion: A > 0, because the contrary possibility yields the contradic- 
tion, 


(V Bz, v Bz) > (V Bz, V Bx). 


The proof is complete. 


Solution 155. 


In some shallow combinatorial sense there are 32 cases to examine: 16 ob- 
tained via combining the four constituents ran A, ker A, ran A*, and ker A* 
with one another by spans, and 16 others via combining them by intersec- 
tions. Consideration of the duality given by orthogonal complements (and 
other, even simpler eliminations) quickly reduce the possibilities to two, 
namely 


ker A N ker A* and ran Á N ran A”. 


The first of these is always a reducing subspace; indeed both A and A* 
map it into {0}. An explicit look at the duality can do no harm: since the 
orthogonal complement of a reducing subspace is a reducing subspace, it 
follows that ran A + ran A* is always a reducing subspace. This corollary is 
just as easy to get directly: A maps everything, and therefore in particular 
ran A + ran A*, into ran A ( which is included in ran A + ran A*), and a 
similar statement is true about A*. 
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The second possibility, ran A N ran A*, is not always a reducing sub- 
space. One easy counterexample is given by 


0 10 
A-[00 1|. 
00 0 


Its range consists of all vectors of the form (o, 8,0), and the range of its 
adjoint consists of all vectors of the form (0, 8,*). The intersection of the 
two ranges is the set of all vectors of the form (0, 8,0), which is not only 
not invariant under both A and A*, but, in fact, is invariant under neither. 
The dual is ker A + ker A*, which in the present case consists of the set of 
all vectors of the form (a, 0, y), not invariant under either A or A*. 


Solution 156. 


The only eigenvalue of A is 0 (look at the diagonal of the matrix). If x = 
(01,02, ..., 04), then 


Az = (0,01,02,..., 9-1) 


it follows that Ax = 0 if and only if z is a multiple of 


£n = (0,0,...,0, 1). 


That is: although the algebraic multiplicity of 0 as an eigenvalue of A is n 
(the characteristic polynomial of A is (—A)”), the geometric multiplicity is 
only 1. One way to emphasize the important one of these facts is to say that 
the subspace Mi, consisting of all multiples of z, is the only 1-dimensional 
subspace invariant under A. 

Are there any 2-dimensional subspaces invariant under A? Yes; one 
of them is the subspace Mi? spanned by the last two basis vectors z,,..; and 
£n (or, equivalently, the subspace consisting of all vectors whose first n — 2 
coordinates vanish). That, moreover, is the only possibility. Reason: every 
such subspace has to contain z,, (because it has to contain an eigenvector), 
and, since A is nilpotent, the restriction of A to each such subspace must 
be nilpotent (of index 2). It follows that each such subspace must contain 
at least one vector y in Mg that is not in M1, and hence (consider the span 
of y and zn) must coincide with Mp. 

The rest of the proof climbs up an inductive ladder. If M+ is the sub- 
space spanned by the last k vectors of the basis (2, r2,..., Zn} (or, equiv- 
alently, the subspace consisting of all vectors whose first n — k coordinates 
vanish), then it is obvious that each Mi, is invariant under the truncated 
shift A, and by a modification of the argument of the preceding paragraph 
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(just keep raising the dimensions by 1) it follows that MI; is in fact the only 
invariant subspace of dimension k. (Is it permissible to interpret Mo as 


{0}?) 
Conclusion: the number of invariant subspaces is n+1, and the number 
of reducing subspaces is 2; the truncated shift is irreducible. 


Solution 157. 


The matrix A is the direct sum of the 2 x 2 matrix 


0 1 
»- (à 0) 
and the 1 x 1 matrix 0. A few seconds’ reflection should yield the conclusion 
that the same direct sum statement can be made about 


0 0 1 
A^-[000]|; 
0 0 0 


the only difference between A and A^ is that for A^ the third column plays 
the role that the second column played for A. A more formal way of saying 
that is to say that the permutation matrix that interchanges the second and 
the third columns effects a similarity between A and A^: 


100 0 10 100 0 0 1 
00 1[|:.[00 0]|.|[00 1j=|00 OF. 
0 10 000 0 10 00 0 


Since A^ does have a square root, namely the 3 x 3 truncated shift, so does 
A. Since in fact, more generally, 


0 £6 
(s 
0 0 


is a square root of A^, it follows that A too has many square roots, namely 


the matrices of the form 
0 € 
0 0 
0 0 


obtained from the square roots of A^ by the permutation similarity. 


Ome 3 


me O 3 
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Solution 158. 


Similar normal transformations are unitarily equivalent. Suppose, indeed, 
that A, and A; are normal and that 


A,B = BA», 


where B is invertible. Let B = UP be the polar decomposition of B (Prob- 
lem 145), so that U is unitary and P = V B* B, and compute as follows: 


Ao(B*B) = (A9B*)B 
= (B' A))B 
(by the facts about adjoint intertwining, Solution 150) 
= B*(AiB) = B'(BAs3) (by assumption) 
= (B* B) As. 
The result is that 
A; P? = P? Ap, 
from which it follows (since P is a polynomial in P?) that 
AP = PA3. 
Consequence: 
AUP =UPAz, (by assumption) 
=UA2P (by what was just proved), 
and therefore, since P is invertible, 
AU = U A5. 
That completes the proof of the unitary equivalence of A, and A5. 


Solution 159. 


Are the matrices 
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then U* AU = B. 
Are the matrices 
0 1 0 020 
A-|[002 and B-[001 
0.0 0 000 


unitarily equivalent? The surprising answer is no. More or less sophisti- 
cated proofs for that negative answer are available, but the quickest proof 
is a simple computation that is not sophisticated at all. What can be said 
about a 3 x 3 matrix S with the property that 


SA= BS? 


Written down in terms of matrix entries, the question becomes a system 
of nine equations in nine unknowns. The general solution of the system is 
easy to find; the answer is that the matrix S must have the form 


2€ 0 m 
S-[0 € 0 
0 0 2 


A matrix like that cannot possibly be unitary, and that settles that. 
An alternative proof is based on the observation that 


002 
A-Bg-([ooo 
000 


Since SA = BS implies that SA? = B?S, it becomes pertinent to find 
out which matrices commute with A?. That's another simple computation, 
which leads to the same conclusion. 

These comments seem not to address the main issue (unitary equiva- 
lence of transposes), but in fact they come quite close to it. The A’s in the 
two pairs of examples are the same, but the B’s are not: the first B is the 
transpose of the second. Since the first B is unitarily equivalent to A but 
the second one is not, since, in fact, the second B is unitarily equivalent to 
the transpose of A, it follows that A is not unitarily equivalent to its own 
transpose, and that settles the issue. 

Yes, it settles the issue, but not very satisfactorily. How could one pos- 
sibly discover such examples, and, having discovered them, how could one 
give a conceptual proof that they work instead of an unenlightening com- 
putational one? 

Here is a possible road to discovery. What is sought is a matrix A 
that is not unitarily equivalent to the transpose A. Write A in polar form 
A = UP, with U unitary and P positive (Problem 145), and assume for the 
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time being that P is invertible. There is no real loss of generality in that 
assumption; if there is any example at all, then there are both invertible 
and non-invertible examples. Proof: the addition of a scalar doesn’t change 
the unitary equivalence property in question. Since, moreover, transform- 
ing every matrix in sight by a fixed unitary one doesn’t change the unitary 
equivalence property in question either, there is no loss of generality in 
assuming that the matrix P is in fact diagonal. 

If A = UP, then A’ = PU’, so that to say that A and A’ are unitarily 
equivalent is the same as saying that there exists a unitary matrix W such 
that 


W*(UP)W = PU’, (*) 
or, equivalently, such that 
(W*U)P(WU) = P. 


(The symbol U here denotes the complex conjugate of the matrix U.) As- 
sume then that (+) is true, and write Q = W*U, and R = WU; note that 
Q and R are unitary and that 


QPR=P. 
It follows that 


P? = PP* == QPRR*PQ* = QP?Q*, 


so that Q commutes with P?; since P is a polynomial in P?, it follows that 
Q commutes with P (and similarly that R commutes with P). 

To get a powerful grip on the argument, it is now a good idea to make a 
restrictive assumption: assume that the diagonal entries (the eigenvalues) 
of P are all distinct. In view of the commutativity of Q and P, that assump- 
tion implies that Q too is diagonal and hence, incidentally, that W = UP?. 
The equation (*) yields 


PQUQ = QPUQ = PU', 
and hence, since P is invertible, that 
QUQ =V". 
Since the entries of the unitary diagonal matrix Q are complex numbers of 
absolute value 1, it follows that the absolute values of the matrix U consti- 
tute a symmetric matrix. 


That last result is unexpected but does not seem to be very powerful; 
in fact, it solves the problem. The assumption of the existence of W has 
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implied that U must satisfy a condition. The matrix U, however, has not 
yet been specified; it could have been chosen to be a quite arbitrary unitary 
matrix. If it is chosen so as not to satisfy the necessary condition that the 
existence of W imposes, then it follows that no W can exist, and victory is 
achieved. 

The simplest example of a unitary matrix whose absolute values do not 
form a symmetric matrix is 


A simple P that can be used (positive, diagonal, invertible, with distinct 
eigenvalues) is given by 


Since, however, invertibility is an unnecessary luxury, an even simpler one 


is 
000 
p-(1 1 JI 
00 2 


if that one is used, then the resulting counterexample is 


0 1 0 00 0 01 0 
A-UP-(|[(001]|.[(010|-2[00 2], 
100 002 00 0 


and the process of “discovery” is complete. 


Solution 160. 


If A and B are real, U is unitary, and U* AU = B, then there exists a real 
orthogonal V such that V* AV = B. 

A surprisingly important tool in the proof is the observation that the 
unitary equivalence of A and B via U implies the same result for A* and 
B*. Indeed, the adjoint of the assumed equation is U* A*U = B*. 

Write U interms of its real and imaginary parts (compare Solution 89): 
U = E + iF. It follows from AU = UB that AE = EB and AF = FB, 
and hence that A(E + AF) = (E -- AF)B for every scalar A. If A is real and 
different from a finite number of troublesome scalars (the ones for which 
det(E-F AF) = 0), the real matrix S = E+ AF is invertible, and, of course, 
has the property that AS = SB. 
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Proceed in the same way from U*A*U =  B*: deduce that 
A*(E + AF) = (E + AF)B* for all À, and, in particular, for the ones for 
which E + AF is invertible, and infer that A*S = S B* (and hence that 
S* A* = BS"). 

From here on in the technique of Solution 158 works. Let $ = V P be 
the polar decomposition of S (that theorem works just as well in the real 
case as in the complex one, so that V and P are real). Since 


BP? = BS*S = S* A*S = S*SB = P?B, 
so that P? commutes with B, it follows that P commutes with B. Since 


AVP = AS = SB = VPB = VBP 
and P is invertible, it follows that AV = V B, and the proof is complete. 


Solution 161. 


It is a worrisome fact that eigenvalues of absolute value 1 can not only stop 
the powers of a matrix from tending to 0, they can even make those powers 
explode to infinity. Example: if 


11 
ao. 

n_f{lon 
dep 


Despite this bad omen, strict inequalities do produce the desired conver- 
gence. 

An efficient way to prove convergence is to use the Jordan canonical 
form. (Note that A” — 0 if and only if ($71 AS)" — 0.) The relevant part 
of Jordan theory is the assertion that (the Jordan form of) A is the direct 
sum of matrices of the form A+ B, where B is nilpotent (of some index k). 
Since 


(A-By'zA"cr (1) A"nCÍBG GR m i) Neo at1 pk—1 


then 


as soon as n È k — 1, and since the assumption |A| < 1 (strict inequality) 
implies that the coefficients tend to 0 as n — oo, the proof is complete. 


Solution 162. 


Yes, every power bounded transformation is similar to a contraction. 
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Note first that if A is power bounded, then every eigenvalue of A is less 
than or equal to 1 in absolute value. (Compare the reasoning preceding 
the statement of Problem 161.) To get more powerful information, use the 
Jordan form to write (the matrix of) A as the direct sum of matrices of one 
of the forms 


A000 14000 
1A00 _{0 A 0 0 
E ae ae We o F=l0 o à 0| 
00 1A 000A 


where, for typographical convenience, 4 x 4 matrices are used to indicate 
the general n x n case. It is then enough to prove that each such direct 
summand that can actually occur in a power bounded matrix is similar to 
a contraction. 

Since |A| < 1, the matrix F is a contraction, and nothing else needs 
to be said about it. As far as E is concerned, two things must be said: first, 
|A| cannot be equal to 1, and, second, when |A| « 1, then E is similar to a 
contraction. 

As for the first, a direct computation shows that the entry in row 2, 
column 1 of E? is n\”~1; if |A| = 1, that is inconsistent with power bound- 
edness. As for the second, E is similar to 


A000 
€ » 0 0 
m Oe A OT’ 
00€ A 
where e can be any number different from 0. There are two ways to prove 
that similarity: brute force and pure thought. For brute force, form 


100 0 

0e 0 0 
dm 002 ol’ 

000 e& 


and verify that SES! = E,. For pure thought, check, by inspection, that 
E and E; have the same elementary divisors, and therefore, by abstract 
similarity theory, they must be similar. 

The proof can now be completed by observing that if |A| < 1 and e is 
sufficiently small, then E; is a contraction. The quickest way of establishing 
that observation is to recall that || X|| is a continuous function of X, and 
that, therefore, || E.|| is a continuous function of e. Since ||Eo|| = |A| < 
1, it follows that || E;.|| < 1 when e is sufficiently small, and that settles 
everything. 
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Solution 163. 


What is obvious is that some nilpotent transformations of index 2 can be 
reducible: just form direct sums. That can be done even in spaces of dimen- 
sion 3; the direct sum of a 0 (of size 1) and a nilpotent of index 2 (of size 
2) is nilpotent of index 2 (and size 3). What is not obvious is that, in fact, 
on a space V of dimension greater than 2 every nilpotent transformation A 
of index 2 must be reducible. In the proof it is permissible to assume that 
A # 0 (for otherwise the conclusion is trivial). 

(1) V = ker A + ker A*. Reason: V = ran A + rant A; nilpotence of 
index 2 implies that ran A C ker A, and always rant A = ker A*. In the 
rest of the proof it is permissible to assume that 


ker A N ker A* = {0} 


(for if x Z 0 but Az = A*z = 0, then the span of z is a 1-dimensional 
reducing subspace). 

(2) The dimension of ker A* (the nullity of A*, abbreviated null A*) is 
strictly greater than 1. Since A and A* play completely symmetric roles in 
all these considerations, it is sufficient to prove that null A > 1 (and that 
way there is less notational fuss). Suppose, indeed, that rank A < 1. Since 
A # 0 by assumption, rank A must be 1 (not 0). Since ran A = A(ker* A) 
and the restriction of A to kert A is a one-to-one transformation, it follows 
that 


dim kert A = dim ran A = rank A = 1. 


Thus both ker A and ker A* have dimension 1, and hence V has dimen- 
sion 2 (see (1) above), contradicting the assumption that dim V > 2. This 
contradiction destroys the hypothesis null A < 1. 

(3) If x € ker A*, then A* Az € ran A* C ker A*; in other words, the 
subspace ker A* is invariant under the Hermitian transformation A* A. It 
follows that ker A* contains an eigenvector of A* A, or, equivalently, that 
ker A* has a subspace N of dimension 1 that is invariant under A* A. 

(4) Consider the subspace M = N + AN. Since A maps N to AN and 
AN to (0) (recall that A? — 0), the subspace M is invariant under A. Since 
A* maps N to (0) (recall that N C ker A*) and AN to N, the subspace 
M is invariant under A*. Consequence: M reduces A. Since M > N, the 
dimension of M is not less than 1, and since M = N + AN, the dimension 
of M is not more than 1 +1. Conclusion: M is a non-trivial proper reducing 
subspace for A. 
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Solution 164. 


Yes, a nilpotent transformation of index 3 can be irreducible on C*. One 
example, in a sense “between” the truncated shift and its square, is given 
by the matrix 


000 0 0110 
1000 3 Be „_ 10001 
A= 168 with adjoint A* = 6.0.00 
0100 0000 


The kernel of A is the set of all vectors of the form x = (0,0,7, 6). 
These being the only eigenvectors (the only possible eigenvalue being 0), 
every non-trivial invariant subspace for A must contain one of them (other 
than 0). One way to establish that A is irreducible is to show that, for any 
x of the indicated form, the set consisting of x together with all its images 
under repeated applications of A and A* necessarily spans C*. Consider, 
indeed, the following vectors: 


yı = T = (0, 0, Ys ô), 
y2 A'z = (7,6,0,0), 
) 


ys = Ax = (6,0,0,0), 
Ya AA*z = (0, V 7,6 , 
ys = AA**x = (0,6,7,0), 
ys = A? A*x = (0,0,0,7). 


It is true that no matter what y and 6 are, so long as not both are 0, these 
vectors span the space. If y = 0, then yi, yo, ys, and y4 form a basis; if 
6 = 0, then yi, y», ye, and a simple linear combination of y4 and y; form a 
basis; if neither y nor 6 is 0, then ys, ys, and simple linear combinations of 
yı and ys for one and of yz and ys for another form a basis. 

The question as asked is now answered, but the answer gives only 
a small clue to the more general facts (about possible irreducibility) for 
nilpotent transformations of index k on spaces of dimension n when k « n. 
The case k — 3 and n — 5 hints at the sort of thing that has to be looked 
at; the matrix 


00000 
10000 
10000 
01000 
00200 


does the job in that case. 
It should be emphasized that these considerations have to do with in- 
ner product spaceg, where reduction is defined in terms of adjoints (or, 
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equivalently, in terms of orthogonal complements). There is a purely al- 
gebraic theory of reduction (the existence for an invariant subspace of an 
invariant complement), and in that theory the present question is much 
easier to answer in complete generality. The structure theory of nilpotent 
transformations (in effect, the Jordan normal form), implies that the only 
chance a nilpotent transformation of index k ona space of dimension n has 
to be irreducible (that is: one of two complementary invariant subspaces 
must always be {0}) is to have k = n. 


